Inktomi and Fast Search: Two Troubled Search Companies, One Lesson

May 8, 2012

I found the write up by Diego Basch interesting and thought provoking. I have a little experience with Inktomi. For the original FirstGov.gov system, the US government used Inktomi for the public facing index of US government unclassified information. (FirstGov.gov is now www.usa.gov)

Inktomi had in 2000 a “ready to go” index of content from Dot Gov Web sites. The firm’s business model matched the needs of the US government. There were the normal contracting and technical hurdles for a modestly sized US government project with a fairly tight timeline. No big deal. Job done. Inktomi worked.

When I read “A Relevant Tale: How Google Killed Inktomi,” I thought the write up had some useful information. However, I don’t think Google killed Inktomi or any other search system. Google did not kill Fast Search & Transfer, Excite, HotBot, or any other search system in its rise to its alleged 65 percent share of the search market. (Google share is actually much higher, based on my analyses.)

excite_splash1996 copy

Excite’s early 1997 attempt at portalization. Can you spot the search box? Does this look like the current version of Google? Say, “No.” Now log into Google and run a query for rental car. Now do you see the similarity between the early portal craziness and the modern Google? I do.

What killed off these outfits was their business models. Let me explain using Inktomi and Fast Search as examples. I could cite other cases, but these two are okay for a free blog post for the two or three readers I have.

Inktomi, for whatever reason, concluded that people wanted to offer search, not do the heavy lifting. In the portal fever that was raging from 1998 to 2001, Web sites wanted to be the “front page” of the Internet. The result was that America Online, Excite, Lycos, and Yahoo among others jammed links on the splash page. At one time, I counted more than 60 links on the Excite home page. Once I hit 50 links, I quit counting. My eyes and patience can cope with three to five things. More than that, and I move on.

Inktomi’s analysts did the spreadsheet fever thing, making assumptions about how many Web sites would license Inktomi results, pay Inktomi’s fees, and generate revenue from the front page of the Internet craziness. The reality was that Inktomi did not have enough customers to support the cost of the spidering, bandwidth, investment in performance, research and development for precision and recall, and the other costs that are underestimated or just ignored. The result was the collapse of the company.

Read more

No Big Deal: Beyond Search Passes 8,000 Articles

May 6, 2012

Beyond Search began in January 2008. I wanted to find a way to keep track of the most interesting news which I had been placing in my Overflight system. You can see some of the Overflight functionality at www.arnoldit.com/trax or www.arnoldit.com/taxonomy. A few days ago, Beyond Search passed the 8,000 post mark. You can search the archive of content using either the site search system, provided by Blossom.com, or the Google Custom Search Engine which indexes site content plus the links Beyond Search editors include in stories. Blossom is the search box at the top of the page. The CSE is labeled “Google.”

You can use the content to track a leading vendor; for example, enter the query “Autonomy” in the site specific search box and you see the events which we consider significant. You can also get my personal views on online products and services. Just run a query for “mysteries of online.” You can use the categories to limit a display to indexed content. No index is perfect, but you can look at a result set for a hot topic like “indexing” with a mouse click or two.

Now about the content.

First, I am not running a news operation. In fact, I don’t do news. Neither my editorial team nor I are real journalists. I am supposed to know about medieval religious sermons in Latin. The writers are mostly librarians or researchers who have been trained to produce the equivalent of a debate note card. I learned how to prepare 5×8 inch note cards when I returned to the US from Brazil and entered a wonderful American high school. Let’s see. That was in 1957 or 1958. In short, I have been doing one thing as my core research method for more than 50 years. Do you think I am going to change because a PR maven, an unemployed middle school teacher, an English major turned search expert or a Panda wants me to? In case you don’t know the answer, the answer is, “No.”

Second, we run sponsored content. We  use Google AdSense. We run ads for companies who want to get a message in front of my two or three readers. I wish I knew what the business model for Beyond Search is, but the content continues to flow, seven days a week, year round. When I was in intensive care in January for more than a week, the content flowed. I know one of the editors smuggled my laptop into the hospital lock up where I was. We kept publishing. Those working on the blog just kept on going. My writing was given an extra cycle of editing because I was, quite literally close to being a gone goose. Keep in mind that the only difference between a note card content object and sponsored content is that the subject of the write up gets a chance to provide input to an editor. The ironic or cynical comments remain. If I get fascinated with a topic, I write about it or get one of the editors to produce content objects on the subject. So you will find certain topics get covered and then dropped, it is because I lose interest. You want news? Find a real journalist. Examples of what I follow and then drop range from European search systems to ways to federate the text and numeric data associated with building a fungible product like a personal computer.

Third, I am usually biased, often incorrect, and completely indifferent to the hottest trends that azure chip consultants pump out to sell consulting work. If you read the content in Beyond Search or any of the blogs which we produce, you have the obligation to think about what we present and make your own judgment about its usefulness, accuracy, or appropriateness for your particular situation.

Fourth, I use the content in Beyond Search for my columns in Enterprise Technology Management magazine, Online magazine, Information Today (a library oriented tabloid), KMWorld (an enterprise information tabloid), and Searcher magazine (a specialist publication for people who know how to use the old fashioned Dialog and Lexis systems). The content in my for fee articles is closer to the type of reports I prepare for my one or two clients. I am not a great writer. I try to look at popular or emerging technical trends and put them into the frame of my experience. If you want stories that reinforce received wisdom, you will find Beyond Search inappropriate for your needs. In my for fee columns, I knit together a number of items of information and interpret those items in a business context. The for fee columns, therefore, go beyond what is in the free blog.

My plan is to keep the information stream flowing and free. If you have a comment to make about the point of view or the information in a content object (my word for article or story), use the comments section of the blog. If you write me with spam, silly news releases, and baloney I did not specifically request—be advised: I may write about what I call “desperation marketing.” Don’t like the term? Well, I do, and it is accurate. The facile notion of “pivoting” a company is mostly marketing baloney. I don’t like baloney.

For more information about the editorial policies or how to contact us to get access to our two or three readers, navigate to the About page.

Stephen E Arnold, May 6, 2012

Sponsored by Stephen E Arnold

Need to SearchQuark Files?

May 4, 2012

Need to search within Quark desktop publishing files? Here’s your solution—Quark File Search and Indexing 1.0, a Windows Search add-on. The product description reads:

“Quark File Search & Indexing is a plugin that enhances the Windows Search function in order to instantly find QuarkXpress documents.
“You can use this utility when you manage a big number or QuarkXpress documents and need to locate them quickly.”

The application works with QuarkXpress version 6 or higher, and is produced by MetaDesign Solutions. The company, headquartered in Haryana, India, custom-builds a wide variety of robust IT solutions for its customers. They pride themselves on their transparency and clear communication.

Cynthia Murrell, May 4, 2012

Sponsored by PolySpot

Yammer Embraces Search

May 2, 2012

An enterprise social vendor is jumping into search: BrainyardNews announces, “Yammer Update Emphasizes Enterprise, Cloud Search.” Since search vendors are jumping into almost anything with the merest whiff of money, I guess it makes sense for enterprise social network provider Yammer to pursue search. BrainYard editor David F. Carr writes:

“Yammer is introducing ‘universal search,’ along with an option for project or interest groups within a Yammer enterprise social network to sign up for services without necessarily enlisting the company as a whole. . . . To Yammer, universal search makes it possible to search across connections to both enterprise and cloud-based systems integrated with a Yammer network. For example, a search by customer name might turn up automated updates from Salesforce.com, SAP, and a Microsoft SharePoint site, as well as posts by users about that company.”

Uniquely, Yammer saves space by indexing only the metadata coming into a feed, rather than the underlying data, though full-text indexing may appear in the future. The basic social network service is free, and a la carte pricing for premium options gives customers some flexibility.

The new features are part of the spring update Yammer released this month. Other components include: a new tagging method; a Web part that integrates with MS Office 365; updated mobile apps; and the Yammer Embed feature, now moving up from its beta existence.

Launched in 2008, Yammer pioneered the use of secure, private social networks for the purpose of collaboration. More than 80% of Fortune 500 companies currently use the company’s services.

Cynthia Murrell, May 2, 2012

Sponsored by PolySpot

Q-Sensei 2.0

April 27, 2012

Q-Sensei adds features to its ontology-based search system, we learn in MarketWatch’s “Q-Sensei Enterprise V2.0 Unveiled to Rapidly Develop Tailored Search applications for Big Data.” Prominently featured are an ontology-based data processing/ configuration and a new API to more efficiently handle big data.

What’s an ontology? We keep forgetting. The dictionary says it’s “the branch of metaphysics that studies the nature of existence or being as such.” Wait, that can’t be right. . . . Ok, in information system lingo, ontology “formally represents knowledge as a set of concepts within a domain, and the relationships between those concepts.” That’s better.

The press release says the newest version of Q-Sensei’s enterprise search platform is designed to tailor search-based applications quickly and flexibly to the needs of its clients, using data from Intranets, social media, third parties, and the Internet. We learn from the write up:

“With Q-Sensei Enterprise’s new ontology-based data processing, businesses can rapidly develop new, tailored search-based applications by using existing RDF and OWL resources such as database models, industry or domain-specific ontologies, process definitions and project configurations. This new processing approach also enables harmonization of semantics, components and functionality across business applications. It also improves the speed and efficiency of data process and indexing, increasing platform performance.”

Version 2.0 also boasts a semi-automatic, guided configuration and a new API that makes it easier to integrate  Q-Sensei into other applications.

Q-Sensei was created in 2007 with the merger of the German Lalisio and the American QUASM, and now has offices in both Brooklyn and Erfurt, Germany. Q-Sensei focuses on multi-dimensional search, which it defines as combining full-text and dynamic faceted search with real-time content analysis. The company maintains that its solutions make it easy to find what you need, even if you don’t have the appropriate keywords on hand.

Cynthia Murrell, April 27, 2012

Sponsored by Ikanow

IBM Buys Vivisimo Allegedly for Its Big Data Prowess

April 25, 2012

Big data. Wow. That’s an angle only a public relations person with a degree in 20th century American literature could craft. Vivisimo is many things, but a big data system? News to me for sure.

IBM has been a strong consumer and integrator of open source search solutions. Watson, the game show winner, used Lucene with IBM wrapper software to keep the folks in Jeopardy post production on their toes.

vivisimo search

A screen shot of the Vivisimo Velocity system displaying search results for the RAND organization. Notice the folders in the left hand panel. The interface reveals Vivisimo’s roots in traditional search and retrieval. The federating function operates behind the scenes. The newest versions of Velocity permit a user to annotate a search hit so the system will boost it in subsequent queries if the comment is positive. A negative rating on a result suppresses that result.

I learned that IBM allegedly purchased Vivisimo, a company which I have covered in my various monographs about search and content processing. Forbes ran a story which was at odds with my understanding of what the Vivisimo technology actually does. Here’s the Forbes’ title: “IBM To Buy Vivisimo; Expands Bet On Big Data Analytics.” Notice the phrase “big data analytics.”

Why do I point out the “big data” buzzword? The reasons include:

  • Vivisimo has a clustering method which takes search results and groups them, placing similar results identified by the method in “folders”
  • Vivisimo has a federating method which, like Bright Planet’s and Deep Web Technologies’, takes a user’s query and sends the query to two or more indexing systems, retrieves the results, and displays them to the user
  • Vivisimo has a clever de-duplication method which makes the results list present one item. This is important when one encounters a news story which appears on multiple Web sites.

According to the write up in Forbes, a “real” news outfit:

IBM this morning said it has agreed to acquire Vivisimo, a Pittsburgh-based provider of big data access and analysis tools.

Okay, but in Beyond Search we have documented that Vivisimo followed this trajectory in its sales and marketing efforts since the company opened for business in 2000. In fact, the Wikipedia write up about Vivisimo says this:

Vivisimo is a privately held enterprise search software company in Pittsburgh that develops and sells software products to improve search on the web and in enterprises. The focus of Vivisimo’s research thus far has been the concept of clustering search results based on topic: for example, dividing the results of a search for “cell” into groups like “biology,” “battery,” and “prison.” This process allows users to intuitively narrow their search results to a particular category or browse through related fields of information, and seeks to avoid the “overload” problem of sorting through too many results.

Read more

Google and Its Stock Split

April 16, 2012

I pointed out that the big news from the Google quarterly report was the erosion of revenue from Google’s core business.

Other addled geese, poobahs, and mavens found the stock split more troubling. A good example of the reaction is this Reuters real news story: “Google’s Evil Stock Split.” The idea is that Google seems to be perilously close to violating guidelines put in place 90 years ago. Here’s the key point in my opinion:

Google has, now, clearly violated the spirit of the NYSE rules, if not their letter. It took 15 months for the independent directors on the board to be persuaded of this, in long and secret deliberations.

Well, the independent directors * were * convinced.

I also enjoyed this comment in the Reuters real news story:

This move, then, is basically a way for Google to try to retreat back into its pre-IPO shell as much as possible. It never really wanted to go public in the first place — it was forced into that by the 500-shareholder rule — but at this point, Google is far too entrenched in the corporate landscape to be able to turn back the clock. It’s too big, and too important, and has been public for too long. That’s the thing about going public: it might suck, but once you’ve done it, you’ve done it. And at that point, if you try to pull a stunt like this, you risk looking all too much like Rupert Murdoch.

Okay, real Silicon Valley is starting to look like the real news paragon, Rupert Murdoch.

Wow.

My take is very simple.

The Googlers know that revenue softening can no longer be swept under the rug or surrounded with big band music and fancy dancing. The numbers are too big. The declines are double digits. The grousing about Panda and the push to get people to buy AdWords to visible to some Web site operators.

Therefore, the stock play is designed to leave the existing management team in charge as the financial news get increasingly dodgey. The Google senior management team does not want to be looking at a start up to fund without the Google ID card in their pocket.

So the erosion of online ad efficiency is causing the control push. Because this has been going on among the independent directors, I have concluded that the revenue erosion was noticeable in 2010, maybe earlier.

Will control reverse the online advertising money machine’s functioning. Nah, but the days of the “Google can do no wrong” are either over or drawing to a close. Google has these issues with which to contend:

  1. Legal hassles. Big disc brake applied to some activities.
  2. Amazon, Apple, and Facebook. Each of these companies has learned from Google. This is The Google Legacy I wrote about back in 2004 or 2005. You might want to check it out because Amazon, Apple, and Facebook have out Googled Google and seem to be gaining strength as Google does the fancy dancing.
  3. Costs from brute force solutions. Google spends a lot of dough to keep its brute force indexing system up and running. Facebook, on the other hand, can just spider Web urls which its members have posted. No brute force required to get started with an interesting search solution. Amazon has slapped A9 in the AWS plumbing and can move into search niches where Google has not gotten significant traction. Apple, which Google really wants to emulate, keep chugging along with a walled garden and customers’ religious fervor.    Do you know anyone with religious fervor toward the Google. Well, I know one company. Oracle. See item one above.

Net net: Blekko/Yandex and Facebook could put the squeeze on the Google with a little luck and some good timing. How will Google respond? No clue. Google is not accustomed to playing defense. Ego is a potent concept. As the Greek tragedian said:

Cleverness is not wisdom. Bacchæ l. 395

Stephen E Arnold, April 18, 2012

Sponsored by Pandia.com

Boxfish Brings Search to TV

April 16, 2012

Technology Review recently reported on a new startup that helps users search for words and phrases from TV in the article “Searching the Small Screen.”

According to the article, as of late March, California based Boxfish opened a beta version of its site to the public, allowing users to search through words and phrases that have been seen on television over the past month. The site also allows users to see topics that are trending and set up alerts for specific terms.

Boxfish is currently indexing TV dialogue from the US, UK and Ireland and they plan to add Australia and Canada soon.

The article states:

“The site is simple to use. If you search for, say, “cookie,” you’ll receive a list of results posted in chronological order along with a bit of the transcript in which the word appeared. On the right side of the screen you can see how many times it has been used recently, on how many channels, and also the words most commonly used in the same context. Click on a search result and you’ll see a big chunk of the transcript with bold text indicating the section that includes the search term.”

Since the product is so new, Boxfish still has a few kinks to work out. However, this could be a cool new way for TV watchers to keep up with anything from politics and current events to the latest celebrity gossip.

Jasmine Ashton, April 16, 2012

Sponsored by Pandia.com

Desktop Search Moves to the Cloud

April 12, 2012

Tech Crunch’s Colleen Taylor recently reported on a new app called Found, that lets you find and access your documents whether they are on your computer or online, in the article “Found Makes Searching for Files Anywhere Super Simple (and Really Sick).”

According to the article, the San Francisco based app aims to organize the mess of documents that are relevant to our work and personal lives. Found currently plugs into Gmail, Google Documents, and Dropbox and the company says that it will be adding additional integrations in the near future.

Taylor states:

“Once you install it on your computer, looking for things in Found quickly becomes second nature — and you quickly start to wonder about how much time you wasted searching for things before you had it. Of course, the real key will be seeing how snappy the Found app is once more people are using it after the public launch later this spring — nowadays, an app is only as good as it can scale. But at the moment, Found is looking very like a very promising tool for the those of us who are a bit less organized with our files than we’d like to be.”

While the app won’t be released to the public until mid-May, you can see how Found works via an embedded video in Taylor’s article. The notion of a cloud service indexing content on a local machine may give some users pause. We prefer to use behind-the-firewall solutions. Even cloud back ups are solutions which don’t address the issues we face.

Jasmine Ashton, April 12, 2012

Sponsored by Pandia.com

Open Source Analytics Information Service Now Available

April 9, 2012

ArnoldIT has rolled out The Trend Point information service. Published Monday through Friday, the information services focuses on the intersection of open source software and next-generation analytics. The approach will be for the editors and researchers to identify high-value source documents and then encapsulate these documents into easily-digested articles and stories. In addition, critical commentary, supplementary links, and important facts from the source document are provided. Unlike a news aggregation service run by automated agents, librarians and researchers use the ArnoldIT Overflight tools to track companies, concepts, and products. The combination of human-intermediated research with Overflight provide an executive or business professional with a quick, easy, and free way to keep track of important developments in open source analytics. There is no charge for the service.

trendpoint splash

Stories include:

According to the publisher, Stephen E Arnold:

We believe that commercial abstracting and indexing services have become untenable for the busy professional. We have combined traditional indexing, literature reviews, and critical commentary which help reduce the time required to pinpoint the meaningful information in this exploding open source analytics field.

Our business model is to provide high value information without a fee. Individuals, law firms, and private equity firms wanting additional information about the people, companies, and products we cover are free to contact us. Like other professional services’ firms, we rely on motivated individuals with an information need to tap into our full-scale, in-depth research.

What sets TheTrendPoint and other ArnoldIT.com information services apart is that its approach is similar to that used by commercial information services such as Medline and Disclosure, two information services designed to make reference services more useful.

At this time, TheTrendPoint.com is designed to complement the finding services which ArnoldIT.com publishes. ArnoldIT.com is one of the leading sources of information on subjects ranging from search and content processing to next-generation intelligence systems.

New content is added to the service Monday to Friday. For more information about the service, contact the publisher at seaky2000 at yahoo dot com.

Kenneth Toth, April 9, 2012

Sponsored by Pandia.com

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta