Wolfram Bids for Dominance in the Search Pack

March 9, 2009

A happy quack to the reader who alerted me to this news story:  “Wolfram Alpha: Next Major Search Breakthrough?” here. The system, according to Dan Farber, is called Alpha: Computational Knowledge Engine. The name alone puts to rest any consultant baloney about simplicity and stability in search. Stephen Wolfram is the author of Mathematica, the gold standard in equation crunching. He has whipped out a couple of two pound books that will give most liberal arts grads a migraine. A New Kind of Science has 1,200 pages and lots of equations. Yummy. More detail than a William Carlos Williams’ poem too.

The new system becomes available in May 2009. Not surprisingly, Dr. Wolfram uses lots of math to make the computational knowledge engine sit up and roll over. I don’t have any information in my files about Alpha. You can get the facts from Mr. Farber’s write up.

One item that caught my attention was:

Google would like to own it [Alpha].

With Twitter deemed an also ran by the GOOG, maybe Dr. Wolfram’s math will catch the company’s eye. More information as I find it.

Stephen Arnold,  March 9, 2009

Searching Twitter

March 9, 2009

At dinner on Saturday night, the conversation turned to Twitter. One of the guests asks, “Why would I want to use Twitter?” Another asked, “What’s it good for?” I listened. I will forward to each person in the dinner party Chris Allison’s “Welcome to the Hive Mind: Learn How to Search Twitter” here. Mr. Allison does a good job of documenting Twitter’s real time search system. If you too are baffled by Twitter, read the article and give Twitter a whirl. Join the growing number of intelligence and law enforcement and business intelligence  professionals who are also learning about real time search. Note: most of the information in a Tweet is inconsequential. Aggregated, the micro blog posts are useful.

Stephen Arnold, March 9, 2009

Search: Still in Its Infancy

March 9, 2009

Click here and read the job postings for intelligence professionals. Notice that the skills are those that require an ability to manipulate information, not just in English but in other languages. Here’s a portion of one posting:

Core Collector-certified Collection Management Officers (CMO’s) oversee and facilitate the collection, evaluation, classification, and dissemination of foreign intelligence developed from clandestine sources. CMO’s play a critical role in ensuring that foreign intelligence collected by clandestine sources is relevant,

I keep reading about search is stable and search is simple. I don’t think so. Because language is complex, the challenge for search and content processing vendors is significant. With more than 150 systems available to manipulate information, one would think that software could handle basic collection and analysis, right? Software helps but search is still in its infancy. The source of the jobs? The US Central Intelligence Agency, which is reasonably well equipped with search, text processing, and content analysis systems. Too bad the reality of search is complex, but some find it easy to say the problem is solved and move on in a fog of wackiness.

Stephen Arnold, March 9, 2009

Microsoft Bets on Improved Web Search

March 9, 2009

I saw this story on March 4, 2009, and I came back to it today (March 8, 2009). I thought I could locate my Microsoft Web search timeline. Alas, it eludes me. I have been keeping track of the “improvements” and other Web search initiatives for a number of years. The list is of modest interest. The entries are little more than a sequence of dates and the Web search actions Microsoft took. When Microsoft bought Powerset, a provider of semantic search demonstrated on Wikipedia (a popular corpus for vendors), I made a note, July 2008, Powerset technology based in part on older Xerox PARC semantic components.

The story “Microsoft Eyes Better Searches, Bigger Market Share” via Newsfactor but available to me here said:

Microsoft is testing features that will give searchers organized results to save time, according to Nadella [the Microsoft search wizard du jour]. A feature has been added on the left side of the results pages to give users access to tools to help complete various tasks. The company has also added other features like single-session history and hover preview.

What I found more interesting was the data (maybe assertions?) included in the write up; for example:

  • 40 percent of search queries go unanswered
  • Half of the queries are about searchers returning to previous tasks
  • 46 percent of sessions are longer than 20 minutes.

As I read this, I thought back to the phone call I received when I pointed out that search was pretty awful. The person on that call whose name I can’t recall told me that Microsoft had a system that made my criticism of search in general inapplicable for Microsoft. That call was in 2006 when I was finishing the third and final edition of the Enterprise Search Report that I wrote. (Hooray! I was done with a 600 page encyclopedia).

But this news story made it clear to me at least that search is a work in progress. And the issues addressed in the article and highlighted with the data above suggests to me that Microsoft wants to move from Web search to some richer information centric application; for example, “tools”.

Google enjoys a big lead over Ask.com, Microsoft, and Yahoo in Web search. Over the last year, Google has maintained its lead and in some sectors increased it as Ask.com and Microsoft lost share and Yahoo held steady or experienced fractional increases in usage.

The secret to Web search is anchored in traffic. Lots of traffic dilutes many search sins. Microsoft has to generate traffic. That’s a tough job, and I don’t think a new brand and tools will do the job. Microsoft has tried this before. I remember weird little butterflies stuck to buildings and sidewalks in New York. If I had my timeline, I would have the date. Seems like only yesterday.

Stephen Arnold, March 9, 2009

Digital Reef: A Similarity Search Engine

March 9, 2009

Straightaway, there are two “digital reefs”. One is an elearning company. The other–www.digitalreefinc.com–is a content processing company. In my notes, I described the company as offering an “unstructured data management platform.” The headline on the content processing company’s Web site here is “massively scalable”, which is a good thing. The company, according to my notes, was originally Auraria Networks. When an infusion of venture funding arrived, the Digital Reed name was adopted. I’m grateful. I didn’t know how to spell or pronounce “Auraria”. I filed the company under Aura, which was close enough for horseshoes.

Organizations are awash in data, and most are clueless about the nuggets within nor about the potential risks the data contain. To get a peek under the hood, you will want to download the company’s white paper here. The document is 13 pages long. You can review it at your leisure. The company’s news release here said:

Digital Reef (www.digitalreefinc.com), one of Matrix Partners’ and Pilot House Ventures’ premier portfolio companies, today announces a new approach to discovering and managing unstructured and semi-structured data. The Digital Reef solution helps large enterprises deal with key business issues that cannot be properly addressed using traditional solutions. These issues include eDiscovery, data risk mitigation, knowledge reuse, and strategic storage initiatives—all of which stem from lack of control over unstructured data, and require a degree of scalability and performance that traditional solutions cannot provide.

The company’s system “was designed to rapidly address very large stores of unstructured data, without manual effort or disruption to data center or business activity.” With the company’s analysis and classification tools, a licensee can:

  • Locate specific kinds of data, including sensitive data like Social Security and credit card numbers
  • Identify regulated data for compliance
  • Pinpoint relevant documents for pending legal action
  • Find intellectual property that can be reused for competitive advantage.

The company’s Web log with posts from founder and president Steve Askers (a former Lucent executive) is here. Entries are sparse at this time.

Despite the lousy economy, new entrants continue to pursue the content processing sector. With each new system, I chuckle when I read about “simple” and “stabile” market conditions. Crazy. I don’t have screenshots in my files nor do I have pricing. On the surface, Digital Reef seems to offer tools that overlap with Inxight Software‘s and Megaputer‘s offerings. I will add the company to my watch list.

Stephen Arnold, March 9, 2009

Site Search Done wihout Big Bucks

March 9, 2009

I want to call your attention to a useful article by Christian Heilmann called “Site Search on a Shoestring” here. Site search refers to a search box on a single Web site. For example, you can limit your query to my site on Google with the qualifier site:. Mr. Heilmann covers a number of options in his write up. What I liked was his inclusion of scripts which can be edited to taste. Highly recommended.

Stephen Arnold, March 9, 2009

Microsoft Fast Pricing Strategy Hint

March 8, 2009

A news release issued in South Africa with the title “3fifteen Gets Inside Track on Microsoft Enterprise Search Expertise” here is a ho hum Certified Partner bag of buzz words except for one statement. The comment in the news release that caught my attention was:

“Given the price tag, and strong focus on the technology sets, Microsoft is clearly investing heavily in information access and representation as the next technology wave that businesses will need to increase business productivity and profitability in many instances.”

The conjunction of “price tag” and “clearly investing” was interpreted by my addled goose brain as “low cost” and “buying market share”.

One of the threats Microsoft poses to established vendors such as Autonomy and Endeca is making Fast search technology an item included with larger Microsoft SharePoint and server buys. With this one action, no money conscious company will turn up their nose at Microsoft’s assurances that Fast ESP (enterprise search platform) is the greatest thing since SharePoint.

A low price deal for Fast ESP would put significant pressure on the more expensive solutions. Most organizations are deeply dissatisfied with their search solutions, so giving a low cost product a spin makes perfect sense.

If the 3fifteen wording is just breezy writing, that’s one thing. If the 3fifteen statement is meaty, I think a price skirmish probe may be likely. Many search and content processing companies are working overtime to hit their revenue targets, a bargain in enterprise search could affect a number of companies in the present financial thunder storm.

Not too much stability in the SharePoint search sector in my opinion.

Stephen Arnold, March 8, 2009

Twitter: SWAT or Sissy

March 8, 2009

Farhad Manjoo’s “What the Heck Is Twitter?” here joins the team suggesting that Twitter is a sissy; that is, Twitter can’t kill Google. Google is a tough customer. Underneath those primary colors, Google has a dark core. Mr. Manjoo points out that some blogeratti see Twitter as a SWAT team able to take out Google. Google has “special” search engines. Real time search is a category of search. Twitter has “a great future” (maybe) but it does have the T shirt that says, “Fail whale.”

You should read the Slate story because the online publication has considerable clout, certainly much more than the feather duster the addled goose brandishes.

I would offer several observations:

First, Twitter has a content stream and search is a relatively recent trendlet for Twitter. Twitter is primarily about inconsequential content that when passed through a user filter–that is, a query–can yield timely information. The point, therefore, is that the content can yield nuggets. These are not necessarily “correct”. Google doesn’t have at this time the content flow. Real time search is a logical jump to information that offers the pre-cognitive insights much loved by some analysts (business and intelligence).

Second, Google has been a company with great potential and game changing technology. Twitter may flop. But it has become for me an example of a segment that Google has not been quick to seize either with its own technology or with its Google bucks. Twitter is not my go to search engine, but it has become a case example of a company that has managed to make clear Google’s inability to decide what to do and then do it with the force of will the company demonstrated between 2003 (pre Yahoo overture settlement) and 2006. Since 2007, Google has been, in my opinion, showing signs of bureaucratic indigestion.

Third, users of Twitter see the utility of the service. My hunch is that if I showed Twitter to my father’s friends at his Independent Village lunch group, no one would know what the heck Twitter is, why anyone would send a message, or what possible value is a Tweet like “I am stuck in traffic.” Show Twitter to a group of sixth graders, and I think the uptake will be different. That’s what’s important. Who cares if someone over 25 understands Twitter. The demographics point to a shift in the notion of timeliness expectations of users. To me, Twitter is making clear an opportunity from micro blog message traffic.

Therefore, I am not a Twitter user. I have an expert on staff who sends Tweets as Ben Kent, so we can see how the system interacts with the Twitter-sphere. I am an addled goose, but I am coherent enough to look at the service and see possibilities. I would opine that unless Google, Microsoft, and Yahoo don’t respond to this opportunity, Twitter may become much more than a wonky service with a “Fail whale” T shirt.

Stephen Arnold, March 8, 2009

Deep Peep

March 7, 2009

A happy quack to the reader who sent me a link  to the Deep Peep beta. You can try the beta of the deep Web search engine here. The site said here:

DeepPeep is a search engine specialized in Web forms. The current beta version tracks 13,000 forms across 7 domains. DeepPeep helps you discover the entry points to content in Deep Web (aka Hidden Web) sites, including online databases and Web services. This search engine is designed to cater to the needs of casual Web users in search of online databases (e.g., to search for forms related to used cars), as well as expert users whose goal is to build applications that access hidden-Web information (e.g., to obtain forms in job domain that contain salary, or discover common attribute names in a domain). The development of DeepPeep has been funded by National Science Foundation award #0713637 III-COR: Discovering and Organizing Hidden-Web Sources.

Deep Web is one of those buzz words that waxes and wanes. For many years Bright Planet and Deep Web Technologies have been the systems I associated with indexing content behind passwords and user names. I wrote a report about Google’s programmable search engine in 2007. The PSE contains some “deep Web” functionality, but the GOOG exposes only a fraction of its “deep Web” capabilities to the adoring millions who use the Google search system. An example of a typical “deep Web” data set might be the flight information and prices available at an airline site or the information available to a registered user of an online service. Dealing with “deep Web” issues is a lot of work. Manual fixes to spider scripts are expensive and time consuming. The better “deep Web” systems employ sophisticated methods that eliminate most of the human fiddling required to navigate certain services.

Today quite a few systems have “deep Web” capability but don’t use that phrase to describe their systems. Here’s a screen shot from my test query for “search”. I used the single word “search” because the word pair “enterprise search” returned results that were not useful to me.

deep peep

Give the new system a spin and share your opinions in the comments section of this Web log.

Stephen Arnold, March 7, 2009

Censoring Search

March 7, 2009

The Japan Today Web site ran “Google, Yahoo!, Microsoft Urged Not to Censor Search” here. The article does a good job of summarizing the hoo hah over various Internet filtering efforts. The most interesting paragraph to me was:

RSF [Reporters without Borders] and Amnesty said that currently, “there are more than two dozen countries restricting Internet access on a regular basis.” They said they “understand the challenges of operating in countries that restrict Internet access; these countries are trying to pressure you to obey local laws that do not comport with international law and standards that protect freedom of expression. “But complying with local demands that violate international law does not justify your actions,” they said.

The point that struck me was the implicit assumption that Web indexes are not now filtered or in some way shaped. The broader filtering is not so much new as it is in the public eye. Consequently write ups that want a free Internet with sites available may want to do a bit more digging into what has been done by Web indexing and directory outfits for a long time.

At The Point (Top 5% of the Internet) in 1993–yep, that’s 25 years ago, folks–we built a business on filtering out porn, hate, and other types of sites we defined as inappropriate in our editorial policy. Since those early days of online directories and indexes, content is either not processed, skipped by the crawler, or blocked in the indexes.

Free and open. Sounds great. Not part of the fabric of most indexing operations. If you can’t figure out why, you qualify as an azure chip consultant, fully equipped to advise government entities, non profit institutions, and commercial entities about search, online access, and content. For me, filtering is the * only * way to approach online content. I filter for behind-the-firewall search with a vengeance. Why? You want the stuff in your laptop’s folders in the organization’s index? I filter with the force of legal guidance for eDiscovery. Why? You want to run afoul of the applicable laws as they apply to eDiscovery and redacting? I filter for libraries. Why? You want the library to create problems for patrons with problematic Web sites or malware? No, I didn’t think so.

Free and open. Silliness. Poke around and find out what the guidelines are for content at some of the high profile Web indexing and content companies. If you find a free and open index other than a dark net, shoot me an email at seaky2000 at yahoo dot com. I will check it out.

Stephen Arnold, March 7, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta