Search: Still in Its Infancy
March 9, 2009
Click here and read the job postings for intelligence professionals. Notice that the skills are those that require an ability to manipulate information, not just in English but in other languages. Here’s a portion of one posting:
Core Collector-certified Collection Management Officers (CMO’s) oversee and facilitate the collection, evaluation, classification, and dissemination of foreign intelligence developed from clandestine sources. CMO’s play a critical role in ensuring that foreign intelligence collected by clandestine sources is relevant,
I keep reading about search is stable and search is simple. I don’t think so. Because language is complex, the challenge for search and content processing vendors is significant. With more than 150 systems available to manipulate information, one would think that software could handle basic collection and analysis, right? Software helps but search is still in its infancy. The source of the jobs? The US Central Intelligence Agency, which is reasonably well equipped with search, text processing, and content analysis systems. Too bad the reality of search is complex, but some find it easy to say the problem is solved and move on in a fog of wackiness.
Stephen Arnold, March 9, 2009
Microsoft Bets on Improved Web Search
March 9, 2009
I saw this story on March 4, 2009, and I came back to it today (March 8, 2009). I thought I could locate my Microsoft Web search timeline. Alas, it eludes me. I have been keeping track of the “improvements” and other Web search initiatives for a number of years. The list is of modest interest. The entries are little more than a sequence of dates and the Web search actions Microsoft took. When Microsoft bought Powerset, a provider of semantic search demonstrated on Wikipedia (a popular corpus for vendors), I made a note, July 2008, Powerset technology based in part on older Xerox PARC semantic components.
The story “Microsoft Eyes Better Searches, Bigger Market Share” via Newsfactor but available to me here said:
Microsoft is testing features that will give searchers organized results to save time, according to Nadella [the Microsoft search wizard du jour]. A feature has been added on the left side of the results pages to give users access to tools to help complete various tasks. The company has also added other features like single-session history and hover preview.
What I found more interesting was the data (maybe assertions?) included in the write up; for example:
- 40 percent of search queries go unanswered
- Half of the queries are about searchers returning to previous tasks
- 46 percent of sessions are longer than 20 minutes.
As I read this, I thought back to the phone call I received when I pointed out that search was pretty awful. The person on that call whose name I can’t recall told me that Microsoft had a system that made my criticism of search in general inapplicable for Microsoft. That call was in 2006 when I was finishing the third and final edition of the Enterprise Search Report that I wrote. (Hooray! I was done with a 600 page encyclopedia).
But this news story made it clear to me at least that search is a work in progress. And the issues addressed in the article and highlighted with the data above suggests to me that Microsoft wants to move from Web search to some richer information centric application; for example, “tools”.
Google enjoys a big lead over Ask.com, Microsoft, and Yahoo in Web search. Over the last year, Google has maintained its lead and in some sectors increased it as Ask.com and Microsoft lost share and Yahoo held steady or experienced fractional increases in usage.
The secret to Web search is anchored in traffic. Lots of traffic dilutes many search sins. Microsoft has to generate traffic. That’s a tough job, and I don’t think a new brand and tools will do the job. Microsoft has tried this before. I remember weird little butterflies stuck to buildings and sidewalks in New York. If I had my timeline, I would have the date. Seems like only yesterday.
Stephen Arnold, March 9, 2009
Microsoft: One More Search System
March 9, 2009
TechCrunch’s information technology edition reported here that Microsoft has inked a deal with ZoomInfo.com. You can read the story “ZoomInfo Scores Deal With Microsoft To Integrate Search Into CRM” here. ZoomInfo’s technology extracts information and generates information about people and companies. The TechCrunch description of ZoomInfo is here. ZoomInfo is a business search system and content generation engine. The resulting data makes ZoomInfo.com an excellent example of a vertical search engine.
What I found interesting about this tie up was:
- The deal is an admission that Microsoft’s CRM products lack a search and retrieval system that meets the needs of Dynamics’ users. I have been critical of the search functions provided with Dynamics and this deal certifies the validity of my analysis.
- Microsoft’s own technology is capable with love and attention of delivering ZoomInfo functionality. The fact that Microsoft’s own engineers cannot use Microsoft scripting tools, content access, and data management tools to perform ZoomInfo.com’s functions tells me quite a bit about the utility of those scripting tools, Microsoft’s data management, and the difficulty of creating a commercial grade solution with those products.
- The deal makes evident that Microsoft’s existing search technology such as Fast ESP cannot deliver the type of mash ups that customers want. Fast ESP demonstrates its Market Track report generation system yet Microsoft itself has elected to use a third party solution. After spending $1.2 billion for Fast Search & Transfer, this bypassing of Fast ESP illuminates what I see as some of the cracks in Microsoft’s existing search products.
Maybe this type of deal won’t make waves in big ponds. But here in the mine run off pond, the ZoomInfo.com tie up is great for ZoomInfo.com and its investors. For this addled goose, a quiet honk of satisfaction.
Stephen Arnold, March 9, 2009
Digital Reef: A Similarity Search Engine
March 9, 2009
Straightaway, there are two “digital reefs”. One is an elearning company. The other–www.digitalreefinc.com–is a content processing company. In my notes, I described the company as offering an “unstructured data management platform.” The headline on the content processing company’s Web site here is “massively scalable”, which is a good thing. The company, according to my notes, was originally Auraria Networks. When an infusion of venture funding arrived, the Digital Reed name was adopted. I’m grateful. I didn’t know how to spell or pronounce “Auraria”. I filed the company under Aura, which was close enough for horseshoes.
Organizations are awash in data, and most are clueless about the nuggets within nor about the potential risks the data contain. To get a peek under the hood, you will want to download the company’s white paper here. The document is 13 pages long. You can review it at your leisure. The company’s news release here said:
Digital Reef (www.digitalreefinc.com), one of Matrix Partners’ and Pilot House Ventures’ premier portfolio companies, today announces a new approach to discovering and managing unstructured and semi-structured data. The Digital Reef solution helps large enterprises deal with key business issues that cannot be properly addressed using traditional solutions. These issues include eDiscovery, data risk mitigation, knowledge reuse, and strategic storage initiatives—all of which stem from lack of control over unstructured data, and require a degree of scalability and performance that traditional solutions cannot provide.
The company’s system “was designed to rapidly address very large stores of unstructured data, without manual effort or disruption to data center or business activity.” With the company’s analysis and classification tools, a licensee can:
- Locate specific kinds of data, including sensitive data like Social Security and credit card numbers
- Identify regulated data for compliance
- Pinpoint relevant documents for pending legal action
- Find intellectual property that can be reused for competitive advantage.
The company’s Web log with posts from founder and president Steve Askers (a former Lucent executive) is here. Entries are sparse at this time.
Despite the lousy economy, new entrants continue to pursue the content processing sector. With each new system, I chuckle when I read about “simple” and “stabile” market conditions. Crazy. I don’t have screenshots in my files nor do I have pricing. On the surface, Digital Reef seems to offer tools that overlap with Inxight Software‘s and Megaputer‘s offerings. I will add the company to my watch list.
Stephen Arnold, March 9, 2009
Site Search Done wihout Big Bucks
March 9, 2009
I want to call your attention to a useful article by Christian Heilmann called “Site Search on a Shoestring” here. Site search refers to a search box on a single Web site. For example, you can limit your query to my site on Google with the qualifier site:. Mr. Heilmann covers a number of options in his write up. What I liked was his inclusion of scripts which can be edited to taste. Highly recommended.
Stephen Arnold, March 9, 2009
ODNI Data Mining Report Available
March 8, 2009
If you want to keep a scorecard for data mining projects in some US government agencies, you may find the “Data Mining Report” (unclassified) interesting. You can download a copy here. You will need an acronym knowledgebase to make sense of some of the jargon.
For me, there were two interesting points:
- Video is a sticky wicket: lots of data and the tools are still evolving
- Coordination remains a challenge.
Enjoy.
Stephen Arnold, March 8, 2009
DEMO Search Round Up
March 8, 2009
David Needle’s “Search Takes Center Stage at DEMO” here highlights information processing innovations at this conference. The definition of “search” is broad but I found the write up interesting. Mr. Needle highlights a bookmarking service (Xmarks) and a news aggregation service (Ensembli). This aggregation service prompts the user to enter a term. I tried “enterprise search”, so that’s “search”. The results appear to be a string match.
Ensembli output from www.ensembli.com
Scanning his write up is quicker and less costly than attending this show. Are these services “search”? In my opinion, neither is. Mr. Needle did not mention Evri, a company somewhat closer to the content processing space. Evri, according to my sources, was also at the Evri conference. The word “search” is like my mother’s handbag–a convenient place to put unrelated objects.
Stephen Arnold, March 8, 2009
Microsoft Fast Pricing Strategy Hint
March 8, 2009
A news release issued in South Africa with the title “3fifteen Gets Inside Track on Microsoft Enterprise Search Expertise” here is a ho hum Certified Partner bag of buzz words except for one statement. The comment in the news release that caught my attention was:
“Given the price tag, and strong focus on the technology sets, Microsoft is clearly investing heavily in information access and representation as the next technology wave that businesses will need to increase business productivity and profitability in many instances.”
The conjunction of “price tag” and “clearly investing” was interpreted by my addled goose brain as “low cost” and “buying market share”.
One of the threats Microsoft poses to established vendors such as Autonomy and Endeca is making Fast search technology an item included with larger Microsoft SharePoint and server buys. With this one action, no money conscious company will turn up their nose at Microsoft’s assurances that Fast ESP (enterprise search platform) is the greatest thing since SharePoint.
A low price deal for Fast ESP would put significant pressure on the more expensive solutions. Most organizations are deeply dissatisfied with their search solutions, so giving a low cost product a spin makes perfect sense.
If the 3fifteen wording is just breezy writing, that’s one thing. If the 3fifteen statement is meaty, I think a price skirmish probe may be likely. Many search and content processing companies are working overtime to hit their revenue targets, a bargain in enterprise search could affect a number of companies in the present financial thunder storm.
Not too much stability in the SharePoint search sector in my opinion.
Stephen Arnold, March 8, 2009
Twitter: SWAT or Sissy
March 8, 2009
Farhad Manjoo’s “What the Heck Is Twitter?” here joins the team suggesting that Twitter is a sissy; that is, Twitter can’t kill Google. Google is a tough customer. Underneath those primary colors, Google has a dark core. Mr. Manjoo points out that some blogeratti see Twitter as a SWAT team able to take out Google. Google has “special” search engines. Real time search is a category of search. Twitter has “a great future” (maybe) but it does have the T shirt that says, “Fail whale.”
You should read the Slate story because the online publication has considerable clout, certainly much more than the feather duster the addled goose brandishes.
I would offer several observations:
First, Twitter has a content stream and search is a relatively recent trendlet for Twitter. Twitter is primarily about inconsequential content that when passed through a user filter–that is, a query–can yield timely information. The point, therefore, is that the content can yield nuggets. These are not necessarily “correct”. Google doesn’t have at this time the content flow. Real time search is a logical jump to information that offers the pre-cognitive insights much loved by some analysts (business and intelligence).
Second, Google has been a company with great potential and game changing technology. Twitter may flop. But it has become for me an example of a segment that Google has not been quick to seize either with its own technology or with its Google bucks. Twitter is not my go to search engine, but it has become a case example of a company that has managed to make clear Google’s inability to decide what to do and then do it with the force of will the company demonstrated between 2003 (pre Yahoo overture settlement) and 2006. Since 2007, Google has been, in my opinion, showing signs of bureaucratic indigestion.
Third, users of Twitter see the utility of the service. My hunch is that if I showed Twitter to my father’s friends at his Independent Village lunch group, no one would know what the heck Twitter is, why anyone would send a message, or what possible value is a Tweet like “I am stuck in traffic.” Show Twitter to a group of sixth graders, and I think the uptake will be different. That’s what’s important. Who cares if someone over 25 understands Twitter. The demographics point to a shift in the notion of timeliness expectations of users. To me, Twitter is making clear an opportunity from micro blog message traffic.
Therefore, I am not a Twitter user. I have an expert on staff who sends Tweets as Ben Kent, so we can see how the system interacts with the Twitter-sphere. I am an addled goose, but I am coherent enough to look at the service and see possibilities. I would opine that unless Google, Microsoft, and Yahoo don’t respond to this opportunity, Twitter may become much more than a wonky service with a “Fail whale” T shirt.
Stephen Arnold, March 8, 2009
YAGG: Google Docs Sharing Quirk
March 8, 2009
TechCrunch’s Jason Kincaid’s “Google Privacy Blunder Shares Your Docs Without Permission”, if on the money, revealed yet another alleged Google glitch here. The issue pertains to inadvertent sharing of Google Docs. Mr. Kincaid wrote citing a Google generated message:
this sharing was limited to people “with whom you, or a collaborator with sharing rights, had previously shared a document” – a vague statement that sounds like it could add up to quite a few people. The notice states that only text documents and presentations are affected, not spreadsheets, and provides links to each of the user’s documents that may have been affected.
To be on the safe side, sensitive Google Docs might be happier on your local computing device. The addled goose loves things Google. YAGGs make the goose nervous. You may be different–at least until the alleged matter is clarified.
Stephen Arnold, March 8, 2009