Google Blames Itself after It Blames StopBadWare
February 3, 2009
When you are really smart, it is tough to see yourself as any thing other than — well — fault free. Google, according to the UK Telegraph, first blamed StopBadWare.org for flagging every search result as malware on January 31, 2009. Then Google changed its tune and admitted that its wizards made an error more typical of a first year computer science student than a Googler. You can read the Telegraph’s illuminating story here. A happy quack to Alastair Jamieson and Urmee Khan who wrote “Google Blames Wrong ‘Human Error’ for Stopping Millions from Finding Web Pages”. Ah, the good is able to make errors on a global scale. How reassuring to some competitors. Are you eager to trust this outfit with your enterprise data? This addled goose marvels at hubris of this ilk.
Stephen Arnold, February 3, 2009
Lexalytics’ Jeff Caitlin on Sentiment and Semantics
February 3, 2009
Editor’s Note: Lexalytics is one of the companies that is closely identified with analyzing text for sentiment. When a flow of email contains a negative message, Lexalytics’ system can flag that email. In addition, the company can generate data that provides insight into how people “feel” about a company or product. I am simplifying, of course. Sentiment analysis has emerged as a key content processing function, and like other language-centric tasks, the methods are of increasing interest.
Jeff Caitlin will speak at what has emerged as the “must attend” search and content processing conference in 2009. The Infonortics’ Boston Search Engine meeting features speakers who have an impact on sophisticated search, information processing, and text analytics. Other conferences respond to public relations; the Infonortics’ conference emphasizes substance.
If you want to attend, keep in mind that attendance at the Boston Search Engine Meeting is limited. To get more information about the program, visit the Infonortics Ltd. Web site at www.infonortics.com or click here.
The exclusive interview with Jeff Caitlin took place on February 2, 2009. Here is the text of the interview conducted by Harry Collier, managing director of Infonortics and the individual who created this content-centric conference more than a decade ago. Beyond Search has articles about Lexalytics here and here.
Will you describe briefly your company and its search / content processing technology?
Lexalytics is a Text Analytics company that is best known for our ability to measure the sentiment or tone of content. We plug in on the content processing side of the house, and take unstructured content and extract interesting and useful metadata that applications like Search Engines can use to improve the search experience. The types of metadata typically extracted include: Entities, Concepts, Sentiment, Summaries and Relationships (Person to Company for example).
With search / content processing decades old, what have been the principal barriers to resolving these challenges in the past?
The simple fact that machines aren’t smart like people and don’t actually “understand” the content it is processing… or at least it hasn’t to date. The new generation of text processing systems have advanced grammatic parsers that are allowing us to tackle some of the nasty problems that have stymied us in the past. One such example is Anaphora resolution, sometimes referred to as “pronominal preference”, which is a bunch of big confusing sounding words to explain the understanding of “pronouns”. If you took the sentence, “John Smith is a great guy, so great that he’s my kids godfather and one of the nicest people I’ve ever met.” For people this is a pretty simple sentence to parse and understand, but for a machine this has given us fits for decades. Now with grammatic parsers we understand that “John Smith” and “he” are the same person, and we also understand who the speaker is and what the subject is in this sentence. This enhanced level of understanding is going to improve the accuracy of text parsing and allow for a much deeper analysis of the relationships in the mountains of data we create every day.
What is your approach to problem solving in search and content processing? Do you focus on smarter software, better content processing, improved interfaces, or some other specific area?
Lexalytics is definitely on the better content processing side of the house, our belief is that you can only go so far by improving the search engine… eventually you’re going to have to make the data better to improve the search experience. This is 180 degrees apart from Google who focus exclusively on the search algorithms. This works well for Google in the web search world where you have billions of documents at your disposal, but hasn’t worked as well in the corporate world where finding information isn’t nearly as important as finding the right information and helping users understand why it’s important and who understands it. Our belief is that metadata extraction is one of the best ways to learn the “who” and “why” of content so that enterprise search applications can really improve the efficiency and understanding of their users.
With the rapid change in the business climate, how will the increasing financial pressure on information technology affect search / content processing?
For Lexalytics the adverse business climate has altered the mix of our customers, but to date has not affected the growth in our business (Q1 2009 should be our best ever). What has clearly changed is the mix of customers investing in Search and Content Processing, we typically run about 2/3 small companies and 1/3 large companies. In this environment we are seeing a significant uptick in large companies looking to invest as they seek to increase their productivity. At the same time, we’re seeing a significant drop in the number of smaller companies looking to spend on Text Analytics and Search. The Net-Net of this is that if anything Search appears to be one of the areas that will do well in this climate, because data volumes are going up and staff sizes are going down.
Microsoft acquired Fast Search & Transfer. SAS acquired Teragram. Autonomy acquired Interwoven and Zantaz. In your opinion, will this consolidation create opportunities or shut doors. What options are available to vendors / researchers in this merger-filled environment?
As one of the vendors that works closely with 2 of the 3 the major Enterprise Search vendors we see these acquisitions as a good thing. FAST for example seems to be a well-run organization under Microsoft, and they seem to be very clear on what they do and what they don’t do. This makes it much easier for both partners and smaller vendors to differentiate their products and services from all the larger players. As an example, we are seeing a significant uptick in leads coming directly from the Enterprise Search vendors that are looking to us for help in providing sentiment/tone measurement for their customers. Though these mergers have been good for us, I suspect that won’t be the case for all vendors. We work with the enterprise search companies rather than against them, if you compete with them this may make it even harder to be considered.
As you look forward, what are some new features / issues that you think will become more important in 2009? Where do you see a major break-through over the next 36 months?
The biggest change is going to be the move away from entities that are explicitly stated within a document to a more ‘fluffy’ approach. Whilst this encompasses things like inferring directly stated relationships – “Joe works at Big Company Inc” – is a directly stated relationship it also encompasses being able to infer this information from a less direct statement. “Joe, got in his car and drove, like he did everyday his job at Big Company Inc.” It also covers things like processing of reviews and understanding that sound quality is a feature of an iPod from the context of the document, rather than having a specific list. It also encompasses things of a more semantic nature. Such as understanding that a document talking about Congress is also talking about Government, even though Government might not be explicitly stated.
Graphical interfaces and portals (now called composite applications) are making a comeback. Semantic technology can make point and click interfaces more useful. What other uses of semantic technology do you see gaining significance in 2009? What semantic considerations do you bring to your product and research activities?
One of the key uses of semantic understanding in the future will be in understanding what people are asking or complaining about in content. It’s one thing to measure the sentiment for an item that you’re interested in (say it’s a digital camera), but it’s quite another to understand the items that people are complaining about while reviewing a camera and noting that the “the battery life sucks”. We believe that joining the subject of a discussion to the tone for that discussion will be one of the key advancements in semantic understanding that takes place in the next couple of years.
Where can I find out more about your products, services and research?
Lexalytics can be found on the web at www.lexalytics.com. Our Web log discusses our thoughts on the industry: www.lexalytics.com/lexablog. A downloadable trial is available here. We also have prepared a white paper, and you can get a copy here.
Harry Collier, February 3, 2009
Enterprise Wiki Faux Pas
February 3, 2009
Take a look at this PC World article “Three Myths of the Enterprise Wiki” here. Unlike some of the azure chip consultants who come up with baloney, Janus Boye and his team produce Kobe beef. I don’t want to spoil your fun by summarizing the write up. I can, however, point out the comment that I found most suggestive:
“People often look to Wikipedia as a free form where everyone is contributing, and why could we not do the same with our organization?,” she said, having observed wikis entering the scene to compensate for an intranet that has fallen to the wayside. But, she said, technology alone won’t resolve that issue.
Spot on. Technology does not solve problems and sit happily in the shade. Technology spawns more challenges. Skip the azure chip consultants in the US and head to Denmark. For more info about the Boye organization, click here.
Stephen Arnold, February 2, 2009
Mysteries of Online 3: Free versus Fee Information
February 3, 2009
I have been thinking about Chris Anderson’s article “The Economics of Giving It Away” here in the Wall Street Journal, Saturday, January 31, 2009. I spoke with a couple of people and read the thoughtful posts that are plentiful. A useful one is Staci Kramer’s “‘Long Tail’ Author Anderson: Free Doesn’t Work As A Standalone Business Model” here.
Mr. Anderson is a clever wordsmith and he’s pretty good with numbers. I don’t disagree too much with his analysis. I have encountered similar arguments over the years, and now I understand where the mind set that generates these interesting analyses.
I come from a different mental space, and I think the free – fee duality has several twists that keep the economists thinking and the pundits punditing. Economists and pundits have quite a track record in financial matters in the last nine months.
I want to capture some of the ideas that have not been developed in my previous monographs that touch upon this subject. Relevant information appeared in Publishing on the Internet: A New Medium for a New Millennium (Infonortics, 1996) and in New Trajectories of the Internet: Umbrellas, Traction, Lift, and Other Phenomena (Infonortics, 2001). Both of these publications are out of print. These two specialist monographs were written years ago and may have been among the first to address some of the free – fee issues. I will not repeat that information, confining my comments to the information I have in my notes and not in my for fee work. While not an idea approach from the point of view of today’s reader, I am keeping within the guidelines I set for this Web log as a combination of recycled ideas and some new thoughts I don’t want to let slip away.
Feel free to challenge these ideas. I am still struggling with them.
When a Person Will Pay
In the free – fee arena, I want to give an example of when you, gentle reader, will probably pay anything for specific information. A commercial database, now owned by Thomson (a dead tree publishing working hard to keep up revenues as it tries to reinvent itself), is Poisondex. The idea is that when you know what poison in a person’s system, you can consult the database and get pointers for saving the person’s life. Poisondex is a pay to use database. It is not free. There are situations in which you personally could access the database, but I will comment on those in a moment. Back to the main point.
Your three year old child is dying. You think she ate a household substance. The doctor tells you that he has to consult Poisondex before taking steps to save your child’s life. The nature of poison is such that it is possible to accelerate its effects unless the doc has the specific information at hand. The doctor tells you that you have to pay to save your child’s life. Let’s say that the cost of the three minute database search is $1,000.
Image source: http://2.bp.blogspot.com/_C-gWvTLiIH0/RwMDlffvMpI/AAAAAAAAAGQ/FnFMzGrWOys/s320/Death-Creative-Life.jpg
Will you agree to pay?
In my experience, most Americans, even trophy kids or azure chip consultants with progeny, will say, “Yes. Get the data.”
I have asked focus groups about this situation over the years, and the answers range from “I will pay anything” to “Are you crazy? Of course, my wife and I will pay.” Okay, so now we have established that you will pay for information, and most people don’t ask, “How much?”
The reasons are:
- Your child’s life to you is priceless. You will literally pay anything. This is the reason why bad guys can get people to do almost anything by threatening a child. So much for security unless the victims have had special training.
- You assume that the source is going to be accurate. You “trust” the doctor. You don’t know enough about Poisondex to “trust” its data, but the doctor “certifies” that the information is what’s needed to save your child’s life. In fancy Dan talk, this is provenance of information. You have to know yourself as a subject matter expert or via a proxy that Poisondex will not include information that will increase the likelihood that your child will die.
- Your decision process is necessarily constrained. In short, you don’t have the luxury of time. You don’t negotiate. You don’t call around and get information from contacts whom you think “know” something useful to you at this decision point.
We have, therefore, established with some certainty that information has “value”; in this case, $1,000. How much more will you pay right now? You perceive that your child’s life is worth more to you than any “cost” the doctor or the database imposes on you to save your child’s life. We also know that time is limited, so you can’t stand around and think about searching Google or calling your roomie from your days at Yale University.
You decide. You pay. You wait to see if the child lives.
For certain information, therefore, a market exists. If information is “free” but does not have the provenance to allow you to make a decision quickly, you will almost always go with the information for which a proxy or your own perception says, “This for fee information is probably better than this free information.” You don’t know whether the information in Poisondex is right. You don’t know if the information in Yahoo’s results list is right. You go with your perception.
Digital Textbook Start Up
February 2, 2009
A textbook start up seems unrelated to search. It’s not. You can read about Flatworld, an open source variant, set up to make money with educational materials here. The company wants to offer online books for free. Hard copies carry a price tag.Glyn Moody, who wrote “Flatworld: Open Textbooks” for Open…, made this interesting comment:
It’s too early to tell how this particular implementation will do, but I am absolutely convinced this open textbook approach will do to academic publishing what open source has done to software.
Dead tree educational publishers take note. Change is coming and really fast. Gutenberg gave printing a boost. Online gives a new publishing medium a similar shove.
Stephen Arnold, February 2, 2009
Google Orkut from Calcutta’s Angle
February 2, 2009
I applied for an Orkut account but was ignored. The GOOG wants my goose cooked. Mark Ghosh was a beta tester of Orkut. he created a community which attracted more than 20,000 users. You can read his article “Et Tu Google? Then Fail, Net Safety” here. (I quite like the Shakespearian lilt to the title too.) With social systems and social search all the rage, Mr. Ghosh points out his perception of Google’s management of and commitment to Orkut. For me, the most interesting comment in the article was:
The Orkut application itself is full of holes and though Google seems to respond to major public reports of vulnerabilities, they keep coming back. Support for Orkut from Google is almost non-existent with what appears to be zero accountability. If one plows through the Google help sections to try and solicit help, they are either faced with a page not found or convoluted help screens that barely ever actually lead to a form to request support. Pleas for help and more often answered by the “Orkut hackers” than by actual Google employees. The Orkut application is so dangerous that people do not click on any links that are not Orkut generated and even then accounts and communities are compromised all the time. Hacking scripts and techniques are easily found via a simple Google search.
Google’s gaggle of genius gravitons may be pre occupied with flagging the Internet as malware and giving great Google Earth demonstrations to the Davos attendees. Orkut in general and Mr. Ghosh’s issues are of less significance in my opinion. Mr. Ghosh’s article may point out another interesting example of Google’s muffing the bunny.
Stephen Arnold, February 2, 2009
Frank Bandach, Chief Scientist, eeggi on Semantics and Search
February 2, 2009
An Exclusive Interview by Infonortics Ltd. and Beyond Search
Harry Collier, managing director and founder of the influential Boston Search Engine Meeting, interview Frank Bandach, chief scientist, eeggi, a semantic search company, on January 27, 2009. eeggi has maintained a low profile. The interview with Mr. Bandach is among the first public descriptions of the company’s view of the fast-changing semantic search sector.
The full text of the interview appears below.
Will you describe briefly your company and its search technology?
We are a small new company implementing our very own new technology. Our technology is framed in a rather controversial theory of natural language, exploiting the idea that language itself is a predetermined structure, and as we grow, we simply feed new words to increase its capabilities and its significance. In other words, our brains did not learn to speak but we were rather destined to speak. Scientifically speaking, eeggi is mathematical clustering structure which models natural language, and therefore, some portions of rationality itself. Objectively speaking, eeggi is a linguistic reasoning and rationalizing analysis engine. As a linguistic reasoning engine, is then only natural, that we find ourselves cultivating search, but also other technological fields such as Speech recognition, Concept analysis, Responding, Irrelevance Removal, and others.
What are the three major challenges you see in search in 2009?
The way I perceive this, is that many of the challenges facing search in 2009 (irrelevance, nonsense, and ambiguity) I believe are the same that were faced in previous years. I think that simply our awareness and demands are increasing, and thus require for smarter and more accurate results. This is after all, the history of evolution.
With search decades old, what have been the principal barriers to resolving these challenges in the past?
These problems (irrelevance, nonsense, and ambiguity) have currently being addressed through Artificial Intelligence. However, AI is branched into many areas and disciplines, and AI is also currently evolving and changing. Our approach is unique and follows a completely different attitude, or if I may say, spirit than that from current AI disciplines.
What is your approach to problem solving in search? Do you focus on smarter software, better content processing, improved interfaces, or some other specific area?
Our primary approach is machine intelligence focusing in zero irrelevance, while allowing for synonyms, similarities, rational disambiguation of homonyms or multi-conceptual words, dealing with collocations as unit concepts, grammar, permitting rationality and finally information discovery.
With the rapid change in the business climate, how will the increasing financial pressure on information technology affect search?
The immediate impact of a weak economy, affects all industries, but the fore long impact will be absorbed and disappeared. The future belongs to technology. This is indeed the principle that was ignited long ago with the industrial revolution. It is true, the world faces many challenges ahead, but technology is the reflection of progress, and technology is uniting us day by day, allowing, and at times forcing us, to understand, accept, and admit our differences. For example, unlike ever before, United Sates and India are now becoming virtual neighbors thanks to the Internet.
Search systems have been integrated into such diverse functions as business intelligence and customer support. Do you see search becoming increasingly integrated into enterprise applications? If yes, how will this shift affect the companies providing stand alone search / content processing solutions? If no, what do you see the role of standalone search / content processing applications becoming?
Form our stand, search, translation, speech recognition, machine intelligence, … for all matters, language, all fall under a single umbrella which we identify thorough a Linguistic Reasoning and Rationalization Analysis engine we call eeggi.
Is that an acronym?
Yes. eeegi is shorthand for “”engineered, encyclopedic, global and grammatical identities”.
As you look forward, what are some new features / issues that you think will become more important in 2009? Where do you see a major break-through over the next 36 months?
I truly believe that users will become more and more critical of irrelevance and the quality of their results. New generations will be, and are, more aware and demanding of machine performance. For example, while in my youth to have two little bars and a square in the middle represented a tennis match, and it was an exiting experience, in today’s standards, presenting the same scenario to a kid, will be become a laughing matter. As newer generations move in, foolish results will not form part in their minimum of expectations.
Mobile search is emerging as an important branch of search. Mobile search, however, imposes some limitations on presentation and query submission. What are your views of mobile search’s impact on more traditional enterprise search / content processing?
This is a very interesting question… The functionalities and applications from several machines inevitably begin to merge the very instant that technology permits miniaturization or when a single machine can efficiently evolve and support the applications of the others. Most of the times, is the smallest machine the one that wins. It is going to be very interesting to see how cell phones move, more and more, into fields that were reserved exclusively to computers. It is true, that cell phones by nature, need to integrate small screens, but new folding screens and even projection technologies could do for much larger screens, and as Artificial Intelligence takes on challenges before only available through user performance, screens them selves may move into a secondary function. After all, you and me, we are now talking without implementing any visual aid or for that purpose, screen.
Where can I find more information about your products, services, and research?
We are still a bit into stealth mode. But we have a Web site (eeggi.com) that displays and discusses some basic information. We hope, that by November of 2009, we would have build sufficient linguistic structures to allow eeggi to move into automatic learning of other languages with little, or possibly, no aid from natural speakers or human help.
Thank you.
Harry Collier, Managing Director, Infonortics Ltd.
eeggi Founder Interviewed
February 2, 2009
Frank Bandach, Chief Scientist at eeggi (the acronym stands for “engineered, encyclopedic, global and grammatical identities”) is a semantic search system with a mathematical foundation. You can view demonstrations and get more information here. eeggi has kept a low profile, but Mr. Bandach will deliver one of the presentations at the Infonortics’ Boston Search Engine Meeting in April 2009. You can get more information about the conference at www.infonortics.com or click here.
Beyond Search will post Mr. Bandach’s interviewed conducted by Harry Collier on February 1, 2009. In the interval before the April Boston Search Engine meeting, other interviews and information will be posted here as well. Mr. Collier, managing director of Infonortics, has granted permission to ArnoldIT.com to post the interviews as part of the Search Wizards Speak Web series here.
The Boston Search Engine Meeting is the premier event for search, content processing, and text analytics. If you attend one search-centric conference in 2009, the Boston Search Engine Meeting is the one for your to do list. Other conferences tackle search without the laser focus of the Infonortics’ program committee. In fact, outside of highly technical event sponsored by the ACM, most search conferences wobble across peripheral topics and Web 2.0 trends. Not the Boston Search Engine Meeting. As the interview with eeggi’s senior manager reveals, Infonortics tackles search and content processing with speakers who present useful insights and information.
Unlike other events, the Infonortics Boston Search Engine Meeting attendance is limited. The program recognizes speakers for excellence with the Ev Brenner award selected by such search experts as Dr. Liz Liddy (Dean, Syracuse University), Dr. David Evans (Justsytem, Tokyo), and Sue Feldman (IDC’s vice president of search technology research). Some conferences use marketers, journalists, or search newbies to craft a conference program. Not Mr. Collier. You meet attendees and speakers who have a keen interest in search technology, innovations, and solutions. Experts in search engine marketing find the Boston Meeting foreign territory.
Click here for the interview with Frank Bandach, eeggi.
Stephen Arnold, February 1, 2009
BA-Insight Points to Strong 2009
February 2, 2009
In an exclusive interview for the Search Engine Wizards series, Guy Mounier, one of the senior managers at BA-Insight, looks for a strong 2009. The company grew rapidly in 2008. Although privately-held, Mr. Mounier said, “We are profitable and have been experiencing rapid growth.” You can read the full text of this interview here.
One of the most interesting comments made by Mr. Mounier was:
BA-Insight is the top Enterprise Search ISV Partner of Microsoft. We are a Managed Partner, a status reserved to 200 MS Partners worldwide, and a Global Alliance Member of Microsoft Technology Centers. We are also part of the Google Compete Team. Our software extends the Microsoft Enterprise Search platform, it does not replace it. In fact, our software is not a Search engine. It is a critical differentiate with other ISV’s in the information access sector. We focus exclusively on plug-and-play connectors to enterprise systems, and advanced search experience on top of MS Enterprise Search and MS SharePoint. We will support FAST in the Office 14 time frame.
BA-Insight has found a lucrative niche. The company adds a turbo boost the its clients’ Microsoft systems. With its support for Google systems, BA-Insight is poised to take advantage of that company’s push into organizations as well.
Mr. Mounier told Search Wizards Speak:
Our next major release is scheduled for end of 2009 and will target the next version of SharePoint (Office 14). We will add significant improvements in the form of automatic metadata extraction, dynamic data visualization, and on-the-fly document assembly.
On the subject of having Microsoft as a partner, Mr. Mounier said:
Microsoft is actually a great company to partner with. Their Solution Sales Professionals, responsible for technical solution sales, always reach out to the partner ecosystem, SI’s or ISV’s, to put forth a solution to the customer on top of the Microsoft platform. Microsoft is a significant contributor to our sales pipeline. We conduct regular webinars and other events with their field sales force to stay top of mind, as many partners are competing for their attention. This has been rather easy as of late, as search becomes increasingly strategic to them. The other benefit of being a top partner of Microsoft is that we get visibility into their product pipeline, typically 18 months or more, that our competitors do not have. We know of their future product investments, and can make sure we stay aligned with their roadmap, adding new features that don’t collide with theirs.
For more information about BA-Insight, navigate to the company’s Web site at www.ba-insight.com or click here.
Stephen Arnold, February 2, 2009
Google GMail Goes Wonky
February 1, 2009
It’s been a tough January 31, 2009, for the addled goose. As noted early this morning, one of the world’s smartest people at Google muffed the bunny, marking sites as malware. Then Digital Lifestyle’s Simon Perry noted that GMail was marking good messages as spam. You can read his story “Google’s Gmail Now Freaking Out: Mis-marking Mail As Spam” here. He wrote:
Gmail appears to be putting legitimate emails in the Spam folder and also showing the following warning message above many emails …Warning: This message may not be from whom it claims to be. Beware of following any links in it or of providing the sender with any personal information.
Yep, Googzilla is really good at details. The world’s smartest people had it right. Those emails were probably spam to the smart algorithms that Googlers create.
Stephen Arnold, February 1, 2009