Google’s Medical Probe
February 5, 2009
Yikes, a medical probe. Quite an image for me. In New York City at one of Alan Brody’s events in early 2007, I described Google’s “I’m feeling doubly lucky” invention. The idea was search without search. One example I used to illustrate search without search was a mobile device that could monitor a user’s health. The “doubly lucky” metaphor appears in a Google open source document and suggests that a mobile device can react to information about a user. In one use case, I suggested, Google could identify a person with a heart problem and summon assistance. No search required. The New York crowd sat silent. One person from a medical company asked, “How can a Web search and advertising company play a role in health care?” I just said, “You might want to keep your radar active?” In short, my talk was a bust. No one had a clue that Google could do mobile, let alone mobile medical devices. Those folks probably don’t remember my talk. I live in rural Kentucky and clearly am a bumpkin. But I think when some of the health care crowd read “Letting Google Take Your Pulse” in the oh-so-sophisticated Forbes Magazine, on February 5, 2009, those folks will have a new pal at trade shows. Googzilla is in the remote medical device monitoring arena. You can read the story here–just a couple of years after Google disclosed the technology in a patent application. No sense in rushing toward understanding the GOOG when you are a New Yorker, is there? For me, the most interesting comment in the Forbes’s write up was:
For IBM, the new Google Health functions are also a dress rehearsal for “smart” health care nationwide. The computing giant has been coaxing the health care industry for years to create a digitized and centrally stored database of patients’ records. That idea may finally be coming to fruition, as President Obama’s infrastructure stimulus package works its way through Congress, with $20 billion of the $819 billion fiscal injection aimed at building a new digitized health record system.
Well, better to understand too late than never. Next week I will release a service to complement Oversight to allow the suave Manhattanites an easy way to monitor Google’s patent documents. The wrong information at the wrong time can be hazardous to a health care portfolio in my opinion.
Stephen Arnold, February 5, 2009
Lexalytics’ Jeff Caitlin on Sentiment and Semantics
February 3, 2009
Editor’s Note: Lexalytics is one of the companies that is closely identified with analyzing text for sentiment. When a flow of email contains a negative message, Lexalytics’ system can flag that email. In addition, the company can generate data that provides insight into how people “feel” about a company or product. I am simplifying, of course. Sentiment analysis has emerged as a key content processing function, and like other language-centric tasks, the methods are of increasing interest.
Jeff Caitlin will speak at what has emerged as the “must attend” search and content processing conference in 2009. The Infonortics’ Boston Search Engine meeting features speakers who have an impact on sophisticated search, information processing, and text analytics. Other conferences respond to public relations; the Infonortics’ conference emphasizes substance.
If you want to attend, keep in mind that attendance at the Boston Search Engine Meeting is limited. To get more information about the program, visit the Infonortics Ltd. Web site at www.infonortics.com or click here.
The exclusive interview with Jeff Caitlin took place on February 2, 2009. Here is the text of the interview conducted by Harry Collier, managing director of Infonortics and the individual who created this content-centric conference more than a decade ago. Beyond Search has articles about Lexalytics here and here.
Will you describe briefly your company and its search / content processing technology?
Lexalytics is a Text Analytics company that is best known for our ability to measure the sentiment or tone of content. We plug in on the content processing side of the house, and take unstructured content and extract interesting and useful metadata that applications like Search Engines can use to improve the search experience. The types of metadata typically extracted include: Entities, Concepts, Sentiment, Summaries and Relationships (Person to Company for example).
With search / content processing decades old, what have been the principal barriers to resolving these challenges in the past?
The simple fact that machines aren’t smart like people and don’t actually “understand” the content it is processing… or at least it hasn’t to date. The new generation of text processing systems have advanced grammatic parsers that are allowing us to tackle some of the nasty problems that have stymied us in the past. One such example is Anaphora resolution, sometimes referred to as “pronominal preference”, which is a bunch of big confusing sounding words to explain the understanding of “pronouns”. If you took the sentence, “John Smith is a great guy, so great that he’s my kids godfather and one of the nicest people I’ve ever met.” For people this is a pretty simple sentence to parse and understand, but for a machine this has given us fits for decades. Now with grammatic parsers we understand that “John Smith” and “he” are the same person, and we also understand who the speaker is and what the subject is in this sentence. This enhanced level of understanding is going to improve the accuracy of text parsing and allow for a much deeper analysis of the relationships in the mountains of data we create every day.
What is your approach to problem solving in search and content processing? Do you focus on smarter software, better content processing, improved interfaces, or some other specific area?
Lexalytics is definitely on the better content processing side of the house, our belief is that you can only go so far by improving the search engine… eventually you’re going to have to make the data better to improve the search experience. This is 180 degrees apart from Google who focus exclusively on the search algorithms. This works well for Google in the web search world where you have billions of documents at your disposal, but hasn’t worked as well in the corporate world where finding information isn’t nearly as important as finding the right information and helping users understand why it’s important and who understands it. Our belief is that metadata extraction is one of the best ways to learn the “who” and “why” of content so that enterprise search applications can really improve the efficiency and understanding of their users.
With the rapid change in the business climate, how will the increasing financial pressure on information technology affect search / content processing?
For Lexalytics the adverse business climate has altered the mix of our customers, but to date has not affected the growth in our business (Q1 2009 should be our best ever). What has clearly changed is the mix of customers investing in Search and Content Processing, we typically run about 2/3 small companies and 1/3 large companies. In this environment we are seeing a significant uptick in large companies looking to invest as they seek to increase their productivity. At the same time, we’re seeing a significant drop in the number of smaller companies looking to spend on Text Analytics and Search. The Net-Net of this is that if anything Search appears to be one of the areas that will do well in this climate, because data volumes are going up and staff sizes are going down.
Microsoft acquired Fast Search & Transfer. SAS acquired Teragram. Autonomy acquired Interwoven and Zantaz. In your opinion, will this consolidation create opportunities or shut doors. What options are available to vendors / researchers in this merger-filled environment?
As one of the vendors that works closely with 2 of the 3 the major Enterprise Search vendors we see these acquisitions as a good thing. FAST for example seems to be a well-run organization under Microsoft, and they seem to be very clear on what they do and what they don’t do. This makes it much easier for both partners and smaller vendors to differentiate their products and services from all the larger players. As an example, we are seeing a significant uptick in leads coming directly from the Enterprise Search vendors that are looking to us for help in providing sentiment/tone measurement for their customers. Though these mergers have been good for us, I suspect that won’t be the case for all vendors. We work with the enterprise search companies rather than against them, if you compete with them this may make it even harder to be considered.
As you look forward, what are some new features / issues that you think will become more important in 2009? Where do you see a major break-through over the next 36 months?
The biggest change is going to be the move away from entities that are explicitly stated within a document to a more ‘fluffy’ approach. Whilst this encompasses things like inferring directly stated relationships – “Joe works at Big Company Inc” – is a directly stated relationship it also encompasses being able to infer this information from a less direct statement. “Joe, got in his car and drove, like he did everyday his job at Big Company Inc.” It also covers things like processing of reviews and understanding that sound quality is a feature of an iPod from the context of the document, rather than having a specific list. It also encompasses things of a more semantic nature. Such as understanding that a document talking about Congress is also talking about Government, even though Government might not be explicitly stated.
Graphical interfaces and portals (now called composite applications) are making a comeback. Semantic technology can make point and click interfaces more useful. What other uses of semantic technology do you see gaining significance in 2009? What semantic considerations do you bring to your product and research activities?
One of the key uses of semantic understanding in the future will be in understanding what people are asking or complaining about in content. It’s one thing to measure the sentiment for an item that you’re interested in (say it’s a digital camera), but it’s quite another to understand the items that people are complaining about while reviewing a camera and noting that the “the battery life sucks”. We believe that joining the subject of a discussion to the tone for that discussion will be one of the key advancements in semantic understanding that takes place in the next couple of years.
Where can I find out more about your products, services and research?
Lexalytics can be found on the web at www.lexalytics.com. Our Web log discusses our thoughts on the industry: www.lexalytics.com/lexablog. A downloadable trial is available here. We also have prepared a white paper, and you can get a copy here.
Harry Collier, February 3, 2009
eeggi Founder Interviewed
February 2, 2009
Frank Bandach, Chief Scientist at eeggi (the acronym stands for “engineered, encyclopedic, global and grammatical identities”) is a semantic search system with a mathematical foundation. You can view demonstrations and get more information here. eeggi has kept a low profile, but Mr. Bandach will deliver one of the presentations at the Infonortics’ Boston Search Engine Meeting in April 2009. You can get more information about the conference at www.infonortics.com or click here.
Beyond Search will post Mr. Bandach’s interviewed conducted by Harry Collier on February 1, 2009. In the interval before the April Boston Search Engine meeting, other interviews and information will be posted here as well. Mr. Collier, managing director of Infonortics, has granted permission to ArnoldIT.com to post the interviews as part of the Search Wizards Speak Web series here.
The Boston Search Engine Meeting is the premier event for search, content processing, and text analytics. If you attend one search-centric conference in 2009, the Boston Search Engine Meeting is the one for your to do list. Other conferences tackle search without the laser focus of the Infonortics’ program committee. In fact, outside of highly technical event sponsored by the ACM, most search conferences wobble across peripheral topics and Web 2.0 trends. Not the Boston Search Engine Meeting. As the interview with eeggi’s senior manager reveals, Infonortics tackles search and content processing with speakers who present useful insights and information.
Unlike other events, the Infonortics Boston Search Engine Meeting attendance is limited. The program recognizes speakers for excellence with the Ev Brenner award selected by such search experts as Dr. Liz Liddy (Dean, Syracuse University), Dr. David Evans (Justsytem, Tokyo), and Sue Feldman (IDC’s vice president of search technology research). Some conferences use marketers, journalists, or search newbies to craft a conference program. Not Mr. Collier. You meet attendees and speakers who have a keen interest in search technology, innovations, and solutions. Experts in search engine marketing find the Boston Meeting foreign territory.
Click here for the interview with Frank Bandach, eeggi.
Stephen Arnold, February 1, 2009
SurfRay Management
January 31, 2009
I did some poking around on Friday, January 30, 2008. One of the more interesting items was confirmation that SurfRay president Bill Cobb resigned from SurfRay earlier this month. There has been some chatter that he was forced out of the company. That’s not exactly on the money. I spoke with Mr. Cobb and he is interested in exploring leadership opportunities. With regard to the future of SurfRay, I can say with confidence that the company is indeed sailing in rough waters against a headwind. If an investment firm is poised to acquire the assets, that deal will have to be made quickly. In my opinion, time is running out for SurfRay.
Stephen Arnold, January 31, 2009
Semantic Universe Sighted
January 31, 2009
A happy quack to the reader who alerted me to this Yahoo News story here about the Semantic Universe. According to Tony Shaw, Editor of Semantic Universe Network, “The semantic community needs a vehicle to communicate the comprehensive business applications and benefits of semantic technology, as well as a better way to connect developers, customers, entrepreneurs and investors. Semantic Universe Network will be that vehicle.” Sponsorship opportunities are available too. You can get additional information from the Web site here. The goslings at Beyond Search wish the information service well.
Stephen Arnold, January 31, 2009
Facebook and Twitter: Who Owns What
January 30, 2009
If a Facebook or Twitter fails, what happens? What a silly question. According to Jeremy Liew, Facebook is “pretty comfortable” about where the company is “right now”. You can find this statement and quite a bit of useful commentary in the article “Warning: Dependence on Facebook, Twitter Could Be Hazardous to Your Business” here. For me the most important comment in the write up by Mark Glaser was:
If you are planning on using either Twitter or Facebook as a marketing platform for yourself or your business, be sure to read the Terms of Service carefully. That’s what Facebook’s Larry Yu advised when I talked to him. “The important thing for people to do is to review the Terms of Service,” he said. “A lot of people don’t do that. They don’t have experience with it, and we encourage people to do it…There are also terms for application developers. As people decide to develop on the platform, they have to be comfortable with those terms.”
This addled goose is wary of social networks. Some trophy generation denizens believe that they don’t exist unless providing information on these publishing platforms. The trophy kids want to “hook up” and keep their “friends” informed about their activities and where abouts. When one of the trophy kids becomes a person of interest to law enforcement, those social postings are going to be somewhat useful to certain authorities. I wonder if the trophy kids realize that some information which is innocuous at the time it becomes available might provide insights to a more informed thinker. Run a query for profiling and see what you think of that discipline. Finally, there’s a nifty tool called the Analyst’s Notebook. If you are not familiar with it, run a Google query for that puppy. From my point of view the information “in” social systems is fascinating. Technology is an interesting construct. The consequences of technology can be even more interesting. Think search, content processing, link analysis, clustering, and other useful methods crunching on those social data. Yum, yum.
Stephen Arnold, January 31, 2009
Exalead: Moving the Front Line
January 26, 2009
A happy quack to the reader in California who sent me an update on Exalead. In the last 10 days, I have received a steady flow of news. The company continues to make headway in the US market.
The company has announced CloudView OEM Edition 5.0. This is a version of the product that can be embedded in third- party applications. The product has been designed for independent software vendors and software as a service providers. The OEM edition includes performance improvements with tweaks to make embedding easier and quicker. The product, as I understand it, can be used to add search and sophisticated content processing functions to email, CMS, call center, and other information centric applications.
Paul Doscher, CEO of Exalead said:
As the use of traditional Web and Web 2.0 technologies including wikis, instant messaging, social networking, and collaboration has proliferated within the enterprise, users have come to expect the same simplicity, speed, and scale from their enterprise software providers. The challenge for ISVs is to provide that same experience in their search capabilities without sacrificing the security and precision required for enterprise use. Exalead CloudView OEM Edition helps them deliver on that challenge.
(Note: you can read an exclusive January 2008 Beyond Search interview with Mr. Doscher here.)
Features of the new product include:
- Ability to deal with petabytes of data
- Aggregation, collation, and normalization of data from disparate structured and unstructured sources; for example, HTML, Microsoft Office documents and other files scattered across corporate servers, data located at SaaS providers, active and archived e-mail, relational data, proprietary application data, etc.
- Support for fuzzy and precise relevancy
- Small CPU and disk footprints
- Scalability to handle spikes
- High peak user concurrency
- Support for existing interfaces, security models, and data source
- Multi language support.
In my April 2008 Gilbane Group report Beyond Search I highlighted Exalead’s architectural advantage. Based on my research, Exalead and Google tackle scaling and performance in somewhat similar ways. (Note: the founder of Exalead was a senior AltaVista.com engineer. You can read an interview with François Bourdoncle here.)
Microsoft-Nortel Parallel
January 23, 2009
Matthew Nickasch’s “Could Microsoft Become Another Nortel?” here is an article that would not have occurred to us in Harrod’s Creek, Kentucky. We don’t think too much about non search vendors and Nortel is not a player in the space we monitor. Microsoft is a search vendor. The company has Web search, various test search systems which you can follow here, Powerset (based on long standing Xerox technology), Fast Search & Transfer (a Web search company that morphed into enterprise search then publishing systems and now into conference management).
Mr. Nickasch picks up the theme of the layoffs at Microsoft that were triggered by the firm’s financial results reported in January 2009. For me, the most interesting comment in the article was:
Many large companies have much to learn from the recent events of Nortel, who filed for bankruptcy protection last week. Organizations with disjunct structures and complexly-integrated business functions need to critically evaluate their overall business structure.
I am not a fan of MBA speak, but I absolutely agreed with the use of the word “disjunct”. That is a very nice way of saying disorganized, confused, and addled (just like the goose writing this Web log). Nortel, once a giant, is now a mouse. A mouse in debt at that.
Three notions were triggered by Mr. Nickasch’s apt juxtaposition.
First, could this be the start of a more serious effort to break up Microsoft? Unlike Nortel (Canadian debt, government involvement, global competition), Microsoft could be segmented easily. Shareholders would get a boost from a break up in my view.
Second, what happens to orphans like big dollar acquisitions that have modest profile into today’s competitive enterprise market. I hear about SharePoint. I hear about Silverlight. I even hear about Windows Mobile. I don’t hear about ESP. In case you have forgotten, that’s not paranormal insight; that’s enterprise search platform.
Third, what’s the positioning of on premises software versus cloud software. Microsoft has quite a few brands and is at risk in terms of making clear what tool, system, service, and feature is associated with what product line.
In my opinion, I think Mr. Nickasch has forged a brilliant pairing. A happy quack to him.
Stephen Arnold, January 23, 2009
Mark Logic: A Lifesaver for Content Producing Organizations
January 23, 2009
Technology has to reduce costs, streamline product production, and deliver a competitive advantage. Without a payoff, the zippiest technology is not worth too much. Beyond Search casts a skeptical eye on technology that is disconnected from the real-world of today’s financial crises.
This week, the Beyond Search team was given a briefing about some interesting new functions and services for the Mark Logic platform. If you are not familiar with Mark Logic, click here to read an interview with the company’s senior manager, Dave Kellogg, or click here to access Mark Logic’s description of its next generation data management platform and content processing system. (The demos are quite useful by the way.)
What’s new?
First, Mark Logic has developed a MarkLogic Connector for SharePoint. There are somewhere in the neighborhood of 100 million SharePoint licenses in the world at this time (January 2009). Microsoft provides some basic tools, but for industrial strength content manipulation, the Mark Logic platform with its support industry standards like the XQuery language and Open XML adds beef to the anorexic SharePoint frame.
With the connector, an organization can move content automatically from SharePoint into the Mark Logic system. Once in the Mark Logic environment, the content can be sliced, diced, index, classified, and repurposed. In fact, once set up, the SharePoint system feeds content into an automated publishing system that is more agile than the multi million dollar enterprise publishing systems that IBM and Hewlett Packard are pushing on their customers.
Mark Logic’s SharePoint connector includes an added bonus. if a SharePoint system goes south, the content in the Mark Logic system can be made available to a rebuilt SharePoint system. The benefit is that Mark Logic adds an no-cost insurance policy for SharePoint. Although a solid product, SharePoint has idiosyncrasies and the Mark Logic platform, the new “active library mirroring”, and the “workflow integration” components give content producing organizations levers on which to boost their competitive advantage.
Think automation. Think cost control. Think product agility.
Second, Mark Logic Toolkit for Word is a new and much needed rework of a 25 year old technology. In the 10980s, publishing and content producing organizations used XyWrite III+ to provide writers with an environment. The writer would type into the XyWrite system. When the file was saved, XyWrite would perform important housekeeping functions automatically. For example, a newspaper company could ask a reporter to file using XyWrite. When the story was complete, XyWrite would insert the reporter’s name and title, insert standard tags, and include the typesetting codes so the story could flow into a DEC 20 ITPS publishing system or its equivalent. XyWrite was acquired by IBM and orphaned hundreds of publishing customers, including the US House of Representatives to name one.
Now Mark Logic has created a XyWrite for the 21st century. The idea is that a content producing organization can build an application with the MarkLogic Toolkit for Word. The application runs in Word. Since many professionals work exclusively in Word, almost any content task can be automated in part. The idea is to remove certain burdensome tasks like inserting metadata into a document, copying it to a specific location, and notifying a colleague that the draft is available for inclusion in another document. The Toolkit makes it possible to eliminate some of the human tasks in order to reduce content production delays and minimize errors introduced via repetitive tasks. Humans get tired. Software does not. If you want additional detail, click here.
Mark Logic interface for Word. © Mark Logic, 2009. Used with permission of Mark Logic.
Net Net
Beyond Search tracks innovations in data and information management. Mark Logic is an interesting company because it points the way content manipulation systems are moving; namely:
- Open standards and easy extensibility
- Practical functions that reduce the costs associated with content production and manipulation
- Support for applications that are–whether one likes or dislikes SharePoint and Word or not–the standard for certain organizational tasks.
The inclusion of a work flow component adds to the usefulness of the Mark Logic solution. We’re impressed. A happy quack to the engineering team at Mark Logic. Now, what’s next?
Stephen Arnold, January 23, 2009
New Google Study Announced
January 21, 2009
In July 2007, I vowed, “No more Google studies.” I was tired. Now I am just about finished with my third analysis of Google’s technology and business strategy. The two are intertwined. My publisher (Harry Collier, Infonortics Ltd.) has posted some preliminary information here about the forthcoming monograph, Google: The Digital Gutenberg. If you are curious how a Web search engine can be a digital Gutenberg, you will find this analysis of Google’s newest information technology useful. None of the information in this monograph has appeared in the more than 1,200 posts on this Web log, in my two previous Google studies, nor in my more than 200 publicly available articles, columns, and talks.
In short, the monograph will contain new information.
If you are involved in traditional media as a distributor, producer, content creator, aggregator, reseller, indexer, or user–you will find the monograph useful. You may get a business idea or two. If you are the nervous type, the monograph will give some ideas on which to chew. This study represents more than one year of research and analysis. I don’t pay much attention to the received wisdom about Google. I do focus almost exclusively on the open source information about Google’s technology using journal articles, presentations, and patent documents. The result is a look at Google that is quite different from the Google is an advertising agency approach that continues to dominate discourse. Even the recent chatter about Google’s semantic technology is old hat if you read my previous Google monographs. In short, I think this third study provides a solid look at what Google will be unveiling in the period between mid 2009 and the end of 2010. Here are the links to my two earlier studies.
- The Google Legacy. Describes how Google’s search system became an application platform. You know this today, but my analysis appeared in early 2005.
- Google Version 2.0. Explores Google’s semantic technology and the company’s innovations that greased the skids for applications, enterprise solutions, and disintermediation of commercial database publishers. A recent podcast broke the old news just a few days ago. Suffice it to say that most pundits were unaware of the scope and scale of Google’s semantic innovations. Cluelessness is reassuring, just not helpful when trying to assess a competitive threat in my opinion.
I don’t have the energy to think about a fourth Google study, but this trilogy does provide a reasonably comprehensive view of Google’s technical infrastructure. I know from feedback from Googlers that the information about some of Google’s advanced technology is not widely known among Google’s rank and file employees. Google’s top wizards know, but these folks are generally not too descriptive about Google’s competitive strengths. Most pundits are happy to get a Google mouse pad or maybe a Google baseball hat. Not me. I track the nitty gritty and look past the glow of the lava lamps. I don’t even like Odwalla strawberry banana juice.
Stephen Arnold, January 21, 2009