Blekko: More Dough, Same Splash Page
June 5, 2008
In a phone call today, a 20-something complained about my failure to write about Blekko. I said I would check my files. There’s not much information about this new search engine. The few facts in my files peg the company to Rich Skrenta. He was involved in a news site called Topix.net that I used a couple of years ago and then dropped from my “A” list. (More about this above my signature block.)
His new company attracted some Silicon Valley incredible hulks with money; for example, Marc Andreessen (Netscape, Ning, and several other high profile ventures).
The major news, which I overlooked until today, is that the company has raised about $3 million. Among the backers are Baseline Ventures and some ex-Googlers (a better term is Xoogler, which some Googlers prefer). This information came from the very useful Web log PaidContent.org here. TechCrunch does it usual very good job of providing the basics with some intriguing color here.
The Blekko Web site here features a modest amount of information. There’s a link for the press and one for jobs. Oh, the Web site has a snapshot that puts my data bunny to shame. My hunch is that the whimsical will annoy certain tight collar MBAs, but I like the picture.
This is either a “blekko” or a very interesting programmer from MIT or CalTech.
I haven’t been given a demo. I did hear that the company will be using “advanced algorithms” and “semantic technology”. I’m not sure what this means exactly, but I have added Blekko.com to my watch list.
Attensity: Packaging Text Processing for Higher Value Applications
June 5, 2008
Enterprise search is like a poinsettia three weeks after the holidays. The form of the lovely plant remains, but the color is gone. Poinsettia look unhealthy, and my mother callously tossed them in the trash.
Attensity has been working to take its core content processing technology and apply it to problems where search-and-retrieval won’t work or have already failed. With a modest cash infusion from the CIA’s not-so-secret venture arm, Attensity refined its “deep extraction” technology and looked for big problems remained unresolved by other vendors.
For example, customer support is a sore spot. It’s expensive. It’s hard to manage because turnover often soars to 50 to 60 percent per year. Automation remains blind to import clues in a customer email or voice call. Many systems can figure out that “I’m going to sue you” is a negative message. But most don’t know what 🙁 means.
Attensity has taken its rocket science technology and created MarketVoice. According to Insurance Technology, a CMP Publication, and created:
a new solution enabling insurers to track, analyze and act on customer conversations in blogs, Web forums, product review comments, and other forms of online customer exchanges
Please, read the original story by Kristi Cattafi here. Do this quickly. CMP, like other traditional publishers, takes some interesting angles on its own search and retrieval system. Sometimes it is very good. Other times, it is a bit disappointing.
MarketVoice uses the deep extraction technology, but the system figures out where problems may be warming to a boiling point. Attensity has made its system easier to set up than some of the others that claim to do similar functions. You may be familiar with ClearForest, now part of Reuters, which is now part of Thomson, a multi-national professional information company. Attensity’s appraoch strikes me as easier to set up and more nimble. Your perception may differ from mine, but I think Attensity’s MarketVoice is a wake up call to vendors of text processing systems that are designed to do one function, leaving the licensee to the job of integrating the system’s outputs. Attensity delivers a product. Others deliver programming tool kits.
The company has also swizzled its deep extraction invention to process content on Web logs. Web log content is often hard to figure out. Some comments are declarative. Some are tongue in cheek. Others are spoofs; for example, today I received a comment from a person claiming to be a Googler. Google does not interact directly with me. This is an “old” Google-conceived rule. This spoofer tipped his hand by contacting me directly. That type of context is beyond the ken of text processing systems. Not even Attensity can figure out the sub text for the alleged Google post and my remarks in this paragraph.
Most text processing systems can’t figure out the context of the information, so indexing these primary and secondary components of an article and figuring out what the link means is not trivial. Atensity’s system grinds through text on a Web log and generates reports about customer sentiment. Attensity’s approach is useful, and it works quite well. You can read more about this system here. If the link 404s, just navigate to www.attensity.com and poke through the information on the site.
Dr. David Bean, a wizard with a passion for language, has been aggressive in his push to make rocket science useful to mere mortals.
Attensity’s productizing of content analysis is a good example of how to grow a market without making your customers withhold their licensing fees. The company is focusing on large back office specialists. More information about this MarketVoice application is here.
As the screws tighten on vendors of pure search or stand alone text processing software, studying Dr. Bean’s retooling of his rocket science technology may be useful. Attensity is a bit ahead of some of its competitors. Companies will sagging revenues may want to bone up on Attensity’s business model sooner rather than later.
I flagged Attensity as a company to watch in my April 2008 study Beyond Search.
Stephen Arnold, June 5, 2008
Wikia Search: Social Search Is Blooming
June 4, 2008
I haven’t done much thinking about social search. Years ago when I saw a demonstration of Eurekster, now Euereksterswicki. I thought sites suggested by users was interesting. As the Internet expanded, a small collection of recommended sites would be useful. We built Point (Top 5% of the Internet) in 1993, eventually selling the property to CMGI’s Lycos unit. Social search was a variation on Point without the human editorial staff we relied upon 15 years ago.
Wikia: User-Modifiable Results
The big news in the last 24 hours is the sprucing up of the Wikia Search system. The venture is a result of Jim Wales’ creative nature. If you have not tried the system, navigate here and fire several queries at the system. It’s much more comprehensive than the system I tested several months ago. I still like the happy cloud logo.
I ran the query “enterprise search” on the system. The result was a pointer to Northern Light. The second result was a pointer to the enterprise search entry in Wikipedia. So far so good. What sets Wikia apart is that I can use an in-browser editing function to change a hit’s title. I can also move results up and down the page. I can see how that would be useful, but I save interesting hits to a folder. I then return to these saved files and conduct more in-depth investigations. So, the system generates results that are useful to me, contains a dollop of community functionality, and sports a larger index. You can read more about the system on Webware.com, which has a useful description of the service here.
Vivisimo’s Social Search
In New York at the Enterprise Search Summit, someone asked me, “Have you seen Vivisimo’s new social search system?” My answer was, “No, I don’t know much about it.” When I returned to my office, I have a link to Vivisimo’s explanation of social search. Vivisimo announced this function in October 2007, and I think that the catchphrase hooked some people at the New York show, and You can read the announcement here.
The point that resonated with me is:
Enabling users to vote on, rate, tag, save and share content within the search interface is just the first step in creating a collaborative information-enriching environment. Velocity 6.0 allows users to add their own knowledge about information found via search directly into the search result itself in the form of free-text annotation.
In this context, social search means that I can add key words or tags to an item processed by Vivisimo. The term is added to the index. If I provide that term to a colleague, the index term can be used to retrieve the document. An interactive tagging feature is useful, but it was not the type of functionality that I use. Others may find the feature exactly what is needed to make behind-the-firewall search less frustrating.
Social search taps into the wisdom of crowds. Some crowds are calm, even thoughtful. Others can be a management opportunity.
Baynote
Today I received an email from a colleague asking, “Did you see the social search study published by Baynote, Inc. Once again, the answer was, “No, I don’t think so.” I clicked on a link and went through a registration process (easily spoofed) to download PDF of the six-page report.
Baynote is a company specializing in “on demand recommendations and social search for Web sites.” You can explore the company’s Web site here. I didn’t read the verbiage on the Web page. I clicked in the search box and entered my favorite test query, beyond search. No joy The three hits were to information about Baynote. (The phrase beyond search sent to Clusty.com delivers a nice link to this Web log, however.)
I clicked back to the PDF report and scanned it. The main idea I garnered from the white paper is:
Baynote combines a site’s existing search engine results with community wisdom to produce a set of optimized results that is proven to yield greater conversions, longer engagement, and improved satisfaction. Thus, Social Search can be thought of as a community layer on top of the site’s existing search engine. The original search results may be re-ordered in the process, and the augmented results may include additional results that weren’t originally produced by the search engine, but proven to be valuable to your Web site visitors. Because Baynote is delivered as SAAS (software as a service), it can be live on a Web site in as little as 30 days with little or no development, installation or configuration.
If you have an existing search system, you can use Baynote as an add-on. With minimal hassle, you can rank results using the Baynote algorithms, monitor user behavior to shape search results, generate See Also references, and merge results from different collections.
I’m going to update my mental inventory about search, adding social search to list of search types that I lug around in my head.
Observations
I do have reservations about social anything. I’m 85 percent convinced that the Vivisimo and Baynote approaches have merit. But I want to end this short item with these observations:
- Social anything can be spoofed. When I visited Los Alamos National Labs, people with access to the facility fiddled with hard drives and other digital assets. If this stuff can happen at a security-conscious facility, imagine what a summer intern can do with social search in your organization.
- Users often have very good ideas about content. Other users have very bad ideas about content. When there are lots of clicks, then the likelihood of finding something useful edges up. The usefulness of Delicious and StumbleUpon are evidence of this. However, when there are comparatively few clicks, I’m inclined to exercise some extra caution. Tina in the mail room is a great person, but I’m not sure I trust her judgment on the emergency core cooling system schematics.
- The lightweight approach to tagging is not going to yield the type of information that a system like Tacit Software’s provides. If you want social, then take a look at Tacit’s Active Net system here.
- My hunch is that nearly invisible monitoring systems will yield more, higher quality insights about information. In some of my work, I’ve had access to outputs of surveillance systems. The data are often quite useful and generally bias-free. Human systems have humanity’s fingerprints on the data, which can obscure some important items.
Social search can be quite useful. Its precepts work quite well in high traffic environments. In more click sparse environments, a different type of tool is required to ferret out the important people and information.
Stephen Arnold, June 4, 2008
Search: Habits vs Environments
June 2, 2008
In 1980, when you launched the Dialog Information Service search function, the system dumped you into a database about education. From that starting point, you entered a file number. Experienced searchers memorized file numbers; type b 15 and you would be “in” the ABI / INFORM business information file. Type b 16 and you would be able to search PROMT, a collection of business data. Dialog never saw bulletin board systems or the Internet coming.
People fortunate enough to have the money and technical savvy could become online searchers. The technology was sufficiently clumsy and the entire process so unfamiliar to most people as to make online searching an arcane art. Searching in those early days was handled by an intermediary. When I first learned about online databases at Booz, Allen & Hamilton in 1976, the intermediary was the New York office’s boss. I would call the intermediary, explain what I needed, provide a project number, and pick up the outputs on weird thermal paper later that day. As clumsy and expensive as the process was, it was more efficient than doing research with paper journals, printed books, and the horrific microfilm.
By 1983, Dialog had found a market for its mainframe-based search system–librarians. Librarians had two characteristics that MBAs, lawyers, and folks trained in brochure making lacked. First, librarians chose a discipline that required an ability to think about categories. Librarians also understood the importance of having a standard way to identify authors, titles, and subjects.
Second, librarians had a budget to meet the needs of people described as an “end user”. Some of my Booz, Allen colleagues would rush into our corporate library and demand, “Give me everything on ECCS!”
The approach taken by Systems Development (SDC Orbit), BRS (Bibliographic Retrieval Service), DataStar, and the handful of other online vendors was monetized in clever ways. First, a company would pay money to sign up to get a password. Second, the company would send the librarian to training programs. Most programs were free and taught tips and tricks to tame the naked command line. No graphical user interface.
You had to memorize command strings like this one.SS UD=9999 and CC=76?. The system then spit out the most recent records about marketing. The key point is not the complexity. The point is that you had to form specific habits to make the system work. Make an error and the system would deliver nothing useful. Search and retrieval was part puzzle, part programming, and part memorization. At the time, I believed that these habits would be difficult to break. I think the vendors saw their users as hooked on online in the way a life long smoker is hooked on nicotine.
The vendors were wrong. The “habit” was not a habit. The systems were confining, hellishly expensive, and complicated to such a degree that change was hard for vendor. Change for the people who knew how to search was easy. The automatic behavior that worked so well in 1980 began to erode when PCs became available. When the first browser became available, the old solid gold revenue streams started to slip. The intermediaries who controlled online were disintermediated. The stage was set for the Internet, lowest-common-denominator searching, and graphical interfaces. The Internet offered useful information for free. I have dozens of examples of online budgets slashed or eliminated because neither the vendor nor the information professional could explain the value of online information. A visible, direct cost with no proof of payback crippled the original online industry. Many of the companies continue to hang on today, but these firms are in a race that is debilitating. Weaker companies in the commercial database business will find survival more and more difficult.
The notion of online habits persists. There’s a view that once a user has learned one way to perform an online or digital task, it’s game over for competitors. That’s not true. New customer constituencies come into being, and the people skilled in complex, specialized systems can carve out a niche. But hockey stick growth and fat margins are increasingly unlikely for traditional information companies.
IN-Q-TEL Investments: 2006 to April 2008
June 1, 2008
This table brings the summary of IN-Q-TEL investments through April 2008. You can access the investments from 2000 to 2003 here. The investments from 2004 and 2005 are here.
IN-Q-TEL Investments: 2004-2005
May 31, 2008
I’m delighted with the response to my table and links of IN-Q-TEL’s investments up to 2002. If you want to review this information, click here. In this essay, I want to provide the list of companies receiving funding in the two year period from 2004 to 2005. As one of the people reviewing my list pointed out, there are some companies associated with IN-Q-TEL that do not appear in my table. My source is the publicly-accessible information on the IN-Q-TEL Web site. If you know of an investment that I have omitted, please, use the comments section of this Web log to share your information. I appreciate the numerous suggestions to make the list more useful. There is a limit to what we have time to assemble for a no-cost information resource. Please, tell me what you think would improve the utility of the list. If it’s light weight, then I will consider altering the basic information in the table. The table appears after the jump.
Microsoft’s Enterprise Search Center
May 31, 2008
I’m working on a report about SharePoint search. I’ve been surprised by the appetite for SharePoint search information. Judging from the email I receive, SharePoint search is a topic that quite a few readers of this Web log find fascinating.
I’m neutral on the subject.
If you are one of the many who want to beef up the native search functions of SharePoint search, you will want to visit Microsoft’s Web site called Microsoft Enterprise Search here. In my experience, the Web log is more useful than some of the content on the main site. I prefer having marketing collateral separated from technical information.
The information is reasonably well labeled. You can download trials of various SharePoint search versions here. Click the package and you are sent to a download page. No registration hassles and none of the Web 2.0 design elements that I find more annoying than helpful. One useful link is the one to the Microsoft Enterprise Search Blog here.
You can download Microsoft’s desktop search tool as well. When you click on the Desktop Products link, you see a Vista-themed page. Click here to check it out. You have to click a “Get It Now” button to see the five variants that are on offer. I was a bit confused by the choice of Windows Live Toolbar and the MSN Search Toolbar. But you can decide for yourself. My eyes glaze over when I am forced to choose among many options. Read more
Fast Financials: Three Day Old Fish Should Be Discounted
May 30, 2008
You may want to download the revised financials that are available today–May 30, 2008–on the Fast Search & Transfer Web site here. Information that I recall seeing on various Web sites is either no longer available or I lack the skills to locate the data. Mary-Jo Foley in her All about Microsoft Web log wrote a useful description of the implications of the deal when it was first announced. You can read this story here.
Some Fast Search corporate and general business information has been deleted because it was old or because it was deemed no longer of interest. Fortunately, I have a habit of downloading interesting documents when I first see them. Fast Search information is tough to locate using public Web sites for some reason. You can get these PDF documents directly from http://www.newsweb.no/index.jsp?messageId=209172. The explicit link from the Fast Search Web site with the pointer is here: http://www.fastsearch.com/news.aspx?m=329. Note: I am reluctant to post these documents because I am not certain of the Norwegian guidelines for this type of information.
A screen shot of the restated FY2007 data. I used this information plus the data in the FY2006 restated financials to make the table of numbers below.
A Walk Through
Fast Search’s top line revenues for the period from 2004 to 2007 are now reported as increasing from $66.4 million in 2004 to $143.0 million in 2007. That’s a jump of 115.4 percent. In the search engine game, the increase is good, but it does not match Google’s performance with its Google Search Appliance in the same period. Google went from zero revenue in 2004 to an estimate $400 million in the same period. (Note: that Google reported $188 million for its enterprise unit, but I have calculated monies from its educational initiative, maps, and partner contributions in the form of sign up fees, among other enterprise revenue flows.)
Year | Revenue (Restated) | Original Revenue Statement |
2004 |
$66,374,000 |
$66.300,000 |
2005 |
$98,069,000 |
$100,300,000 |
2006 |
$133,741,000 |
$162,200,000 |
2007 |
$142,979,000 |
n.a. |
Nothing too dramatic in this run down except the sharp decrease in FY2006 numbers. But what’s $30 million in today’s loosey goosey financial world? However, when you look at the Fast Search restatements in terms of revenue, I found the losses interesting.
A Warning Signal from Fast Search
I have a copy of the Fast Search & Transfer Mid Quarter Presentation by Joseph J. Lacson, dated December 2006. That document has some optimistic comments about Fast Search’s opportunities. The presentation is no longer available on the Fast Search Web site, but I have made a couple of screen shots from the presentation to give you a sense of what caught my attention. (Since the document is no longer available on the Web, you may want to skip my discussion of this information. I wish I could provide a link to the full document, but I don’t have permission to do that. I wrote Fast Search’s PR department, but I haven’t heard anything from them.)
X1 Technologies Dives into the SharePoint Search Channel
May 29, 2008
X1 Technologies blipped my radar when a source in Mountain View, California, told me that Yahoo inked a deal with X1 for search. You can learn more about X1 and its patent search technology here. The company’s tagline is “a single interface for secure business search”. I’ve been pleased with my X1 search experiences, and in my discussion of X1 as an option for IBM systems, I identified its technology as one well worth a close look.
The Yahoo Connection
Troubled Yahoo–despite its lousy ad system, Panama–has some sharp search and information retrieval wizards. When a point solution for search is needed, these wizards can pinpoint a vendor who can provide a quick fix for a findability ailment. Yahoo, for example, licensed the InQuira system to power the company’s customer support system. You are getting natural language help from InQuira, not Yahoo’s own search system. When Google aced Yahoo with email search, Yahoo’s engineers poked around and licensed Stata Labs’ technology. Yahoo can identify good technology, but that’s now a core weakness. Instead of an integrated search platform, Yahoo uses the Baskin-Robbins’ approach–many different flavors. Some flavors change without warning. The X1 solution deployed by Yahoo in its toolbar offered some useful features; namely, fast indexing and on-the-fly document display.
I took a look at X1 Technologies and learned that its engine indexed quickly. I found the interface geared to an email user, not a dinosaur like me. All in all, I liked the performance and the ability to filter results. Over the years, I tested different versions of the system and concluded that it was worth a look, particularly if the user community wanted an Outlook-type interface and zippy indexing.
X1: Signing Up with MSFT
I learned on May 27, 2008, that X1 made the jump into the Microsoft channel and its fast-moving currents. As you know, a company can sync up with Microsoft, send an engineer to two to Microsoft’s training courses, and demonstrate that its software doesn’t foul up SharePoint or some other “core” Microsoft product. In my experience, third-party software is often more stable than Microsoft’s “core” technology. A “hot fix” can produce some exciting SharePoint moments in my experience. I also enjoy SQLServer back ups that appear to complete and then upon testing, demonstrate a less-the-charming ability to rebuild the data set. Sigh.
X1 offered a desktop search system, free from Yahoo at one time and a modest charge if you bought the commercial version of the product. Now the company offers its X1 Enterprise Search Suite. The technical dope is here. The features of this Microsoft-certified system include:
- Ability to search the contents of Microsoft servers, including Exchange and SharePoint servers
- Federated results; that is, obtaining documents from different servers and displaying a single results list with duplicates removed
- Support for Microsoft’s security model, Microsoft clustering, etc.
- Connectors for more than 400 file types, including the Symantec Enterprise Vault.
With more than 12,000 SharePoint licensees and a rumored 65 million users–a estimate which I doubt–of SharePoint search, X1 joins a number of other prominent enterprise search vendors as Certified Gold partners.
Good Enough Means Trouble for Commercial Database Publishers
May 28, 2008
I began work on my new Google monograph. (I’m loath to reveal any details because I just yesterday started work on this project.) I will be looking at Google’s data management inventions in an attempt to understand how Google is increasing its lead over search rivals like Microsoft and Yahoo while edging ever closer to providing data services to organizations choking on their digital information.
As part of that research, I came across several open source patent documents that explain how Google uses the outputs of several different models to determine a particular value. Last week a Googler saw my presentation which featured a Google illustrative output from a patent application and, in a Googley way, accused me of creating the graphic in Photoshop.
Sorry, chipper Googler, open source means that you can find this document yourself in Google if you know how to search. Google’s system is pretty useful for finding out information about Google even if Googlers don’t know how to use their own search system.
How does Google make it possible for my 86-year-old father to find information about the town in Brazil where we used to live and allow me to surface some of Google’s most closely-guarded secrets? These are questions worth considering. Most people focus on ad revenues and call it a day. Google’s a pretty slick operation, and ads are just part of the secret sauce’s ingredients.
Running Scenarios In my experience, it’s far more common for a team to use a single model and then run a range of scenarios. The high and low scenario outputs are discarded, and the results are averaged. While not perfect, the approach yields a value which can be used as is or refined as more data become available to the system. Google’s twist is that different models generate an answer.
This diagram shows how Google’s policy of incremental “learnings” allows one or more algorithms to become more intelligent over time.
The outputs of each model is mathematically combined with other models’ outputs. As I read the Google engineers’ explanations, it appears that using multiple models generates “good enough” results, and it is possible, according to the patent document I am now analyzing, to replace models whose data are out of bounds.