At the Top of the BI Stack

June 28, 2013

Business intelligence tools are becoming a big priority for even small businesses. TopCultured supplies some guidance for those considering their options in, “The 4 Biggest Business Intelligence Companies.” We were a little surprised that writer Drew Hendricks included Microsoft on this list.

The write-up begins:

“Finding the meaning behind mountains of raw data can be a difficult task, especially for companies that have not been monitoring their processes on a regular basis. Keeping an eye on business intelligence can tell stories of new opportunities, potential verticals for growth, and identify dangerous problems, allowing companies to enact a solution.

“As business intelligence becomes more accessible to smaller companies and startups, with app developers driving mobile solutions, the need for BI-trained workers and software solutions goes up. Take a look at the four top business intelligence companies out there now.”

With that, the list begins. Roambi is lauded for being easy to use and interpret. YellowFin boasts a bird’s-eye-view of a company’s strengths and weaknesses. In at number three, Domo is flexible enough to be used throughout an organization. Microsoft‘s SharePoint—well, I suppose being “considered the industry standard” does give the veteran platform some standing.

See the article for more on each of these companies. Organizations would do well to carefully consider their needs and investigate all options before choosing a BI platform.

Cynthia Murrell, June 28, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

HP, Autonomy, and a Context Free Expert Output about Search: The Bet on a Horse Approach to Market Analysis

May 4, 2013

I don’t think too much about:

  1. Azure chip consultants. You know, these are the firms which make a living from rah rahs, buzzwording, and pontification to sell reports. (I know. I labored at a non-azure chip outfit for what seems like decades. Experience is a good instructor. Oh, if you are a consultant, please, complain about my opinion using the comments section of this free blog.)
  2. Hewlett Packard. I recall that the company used to make lab equipment which was cool. Now I think the firm is in some other businesses but as quickly as I latch on to one like the Treo and mobile, HP exits the business. The venerable firm confuses my 69 year old mind.
  3. Autonomy. I think I did some work for the outfit but I cannot recall. Age and the lifestyle in rural Kentucky takes a toll on the memory I admit.

Nevertheless, I read “HP’s Autonomy Could Face Uphill Battle In Data Market.” There were some gems in the write up which I found amusing and illustrative of the problems which azure chip consulting firms and their experts have when tackling certain business issues.

The main idea of the write up for “investors” is that HP faces “challenges.” Okay. That’s a blinding insight. As you may recall, HP bought Autonomy for $11 billion and then a few months later roiled the “investors” by writing off billions on the deal. That was the mobile phone model, wasn’t it?

The write up then pointed out:

HP wanted Autonomy to jump-start its move into software and cloud-based computing. Autonomy is the No. 1 provider of search and retrieval software that companies use to find and share files and other information on their websites and document management systems.

Okay. But that too seems obvious.

Now here comes the kicker. The expert outfit providing inputs to the reporter doing the bull dog grip on this worn out bone is quoted as saying:

“Software license revenue (in this market) isn’t growing at the same rate as before, and we are beginning to see the rise of some new technologies, specifically content analytics and unified information access,” Schubmehl said. These new types of software can be used with types of business analytics software, business intelligence software and other software to help enterprises do a better job of locating specific information, he says, which is the job of search retrieval software.

I don’t know much about IDC but what strikes me from this passage is that there are some assertions in this snippet which may warrant a tiny bit of evaluation.

image

Will context free analyses deliver a winner? Will there be a Gamblers Anonymous for those who bet on what journalists and mid tier (second string) consultancies promulgate? For more about Gamblers Anonymous navigate to http://www.gamblersanonymous.org/ga/

Here goes:

Read more

Now You Are Talking: Can a Company Make Money with Enterprise Search?

January 22, 2013

I have better things to do that capture my immediate thoughts about “Inside H-P’s Missed Chance to Avoid a Disastrous Deal.” You can find the article in a dead tree version of the Wall Street Journal on page 1 with a jump to Page 16, where the “would not comment” phrase appears with alarming frequency.

The most interesting point in the write up is the quote, allegedly crafted by a Hewlett Packard Big Dog:

Now you’re talking.

Like much of the chatter about search, content processing, and Big Data analytics, on the surface these information retrieval software companies are like Kentucky Derby hopefuls on a crisp spring morning. The big pay day is two minutes away. How can the sleek, groomed, documented thoroughbreds lose?

The reality, documented in the Wall Street Journal, is that some companies with sure fire winning strategies can win. Now you’re talking.

How did HP get itself into the headline making situation? How can smart folks spend so much money, reverse course, and appear to be so scattered? Beats me.

I have, however, seen this before. As I read the Wall Street Journal’s story, I wrote down some thoughts in the margin of the dead tree instance of the story at the breakfast table.

image

A happy quack to Lubrisyn.com

Herewith are my notes to myself:

First, name one search vendor in the period from 1970 to the present which has generated more than $1 billion in revenue from search. Acquisitions like IBM’s purchase of iPhrase (er, what happened to that outfit), Vivisimo (now a Big Data company!), or SPSS’s Clementine (ah, you don’t know Clementine. Shame on you.) Don’t toss Google and its search appliance into the mix. Google only hints at the great success of the product. When was the last time you searched using a Google Search Appliance?

Second, didn’t Microsoft purchase Fast Search & Transfer for $1.2 billion in January 2008. How is that working out? The legions of search add in vendors for SharePoint are busy, but the core system has become a little bit like dear old Clementine. Fast Search was the subject of a couple of probes, but the big question which has not yet been answered as far as I know is, “How much revenue did Fast Search generate versus how much revenue Fast Search reported?” I heard that the revenues were, to some degree, inflated. I thought search was a sure fire way to make money.

Third, after more than a decade of top down marketing, why did Endeca need cash infusions from Intel and SAP venture units? How much did Oracle pay for Endeca? Some azure chip consultants have described Endeca as the leading vendor of enterprise search. Endeca added ecommerce and business intelligence to its line up of products. What was the firm’s revenue at the time of its sale to Oracle? I estimated about $150 million.

Fourth, Dassault, the company with the “system”, bought Exalead. What has happened to this promising technology? Is Exalead now a $200 million a year revenue producer for the prestigious French engineering firm? Perhaps the “system” has been so successful that Exalead is now infused into Dassault clients throughout the world? On the other hand, wouldn’t a solution with this type of impact make headlines every week even in the US. Is it more difficult to to cultivate information retrieval revenues than other types of software revenue? The good news is that Dassault paid a reasonable price for Exalead, avoiding the Autonomy, Endeca, and Fast Search purchase prices.

These examples reminded me that even if my estimates are wide of the mark by 20 or 30 percent, how could any company generate the astounding growth required to pay the $11 billion acquisition cost, invest in search technology, and market a product which is pretty much available for free as open source software today? Answer: Long shot. Exercise that horse and make sure you have what it takes to pay the jockey, the stable hands, the vet, and the transportation costs. Without that cash cushion, a Derby hopeful will put a person in a financial hole. Similar to search dreams of big acquirers? Yep. Maybe identical?

Two different points occurred to me.

On one hand, search and its bandwagon riders like Big Data analytics must seems to be a combination of the Klondike’s mother load and a must-have function no matter what a professional does for a living. The reality is that of the 65 search and related vendors I have written about in my books and confidential reports, only three managed to break the $100 million in search revenue ceiling. The companies were Autonomy, Endeca, and Fast Search. Of the three, only Endeca emerged relatively unscathed from the process. The other 62 companies either went out of business (Convera, Delphes, Entopia) or stalled at revenues in the millions of dollar. If one totals the investments in these 65 firms to generate their revenues, search is not a break even investment. Companies like Attivio and Coveo have captured tens of millions of venture dollars. Those investors want a return. What are the odds that these companies can generate more revenues than Autonomy? Interesting question.

On the other hand, search and its child disciplines remain the most complex of modern computing problems. Whether it is voice to text to search and then to predictive analytics for voice call intercepts or just figuring out what Buffy and Trent in the sales department need to understand a new competitor, software is just not up to the task. That means that money pumped into promising companies will pay big dividends. Now the logic may make sense to an MBA, but I have spent more than 35 years explaining that progress in search is tough to achieve, expensive to support, and disappointing to most system users. The notion that a big company could buy software that is essentially customized to each customer’s use cases (notice the plural of “cases”) and make big money is a characteristic of many firms and managers. The reality is that even governments lack the money to make search work.

Don’t get me wrong.

There are small firms which because they focus on quite specific problems can deliver value to a licensee. However, big money assumes that search technology will be a universal, easily applied to many situations. Even Google, with its paid search model, is now facing innovation challenges. With lots of smart people, Google is hiring the aging wizards of search in an attempt to find something that works better than the voting methods in use today.

What do my jottings suggest? Search is a tough business. Assumptions about how much money one can make from search in an era of open source options and cost cutting need to be looked at in a different way. The current approach, as the Wall Street Journal write up makes clear, is not working particularly well. Does this search revenue track record suggest that the azure chip consultants, former middle school teachers, and real journalists miss the larger message of search, content processing, and Big Data analytics? My tentative answer is, “Yep.”

Stephen E Arnold, January 22, 2013

Get A Comprehensive Search Strategy Plan from Aspire

October 12, 2012

People tend to doubt the power of a good search application.  They take it for granted that all out-of-the-box and Internet search engines are as accurate as Google (only the most powerful in the public eye).  The truth of the matter is most businesses are losing business productivity, because they have not harnessed the true potential of search.  Search Technologies, a leading IT company that specializes in search engine implementation, managed services, and consulting, is the innovator behind Aspire:

“Aspire is a powerful framework and application platform for acquiring both structured and unstructured data from just about any content source, processing / enriching that content, and then publishing it to the search engine or business analytics tool of your choice.”

Aspire uses a built-in indexing pipeline and propriety code maintained by Search Technologies high standards.  It is based on Apache Felix, the leading open source implementation for OSGI standard.  OSGI is built for Java and supported by IT companies worldwide. Aspire can gather documents from a variety of resources, including relational databases, SharePoint, file systems, and many more. The metadata is captured and then it can be enriched, combined, reformatted, or normalized to whatever the business needs before it is submitted search engines, document repositories, or business analytics applications.  Aspire performs content processing that cleans and repackages data for findability.

“Almost all structured data is originally created in a tightly controlled or automated way.

By contrast, unstructured content is created interactively by individual people, and is infinitely variable in its format, style, quality and structure.  Because of this, content processing techniques that were originally developed to work with structured data simply cannot cope with the unpredictability and variability of unstructured content.”

By implementing a content processing application like Aspire, unstructured content is “scrubbed,” then enriched, for better search results.  Most commercial search engines do not have the same filters that weed out relevant content from the bad.  The results displayed to the user are thus poor quality and are of zero to little use.  They try to resolve the problem with custom coding and updates for every new data source that pops up, which is tedious.  Aspire fixes tired coding problems, by using automated metadata extraction and manipulation outside the search engine.

As powerful as commercial search engines are they can often lack the refined quality one gets from a robust ISV.  Aspire does not follow the same search technology path as its competitors, rather it has designed a new, original solution to provide its clients with a comprehensive search strategy plan to help improve productivity, organization, and data management.

Remember. Search Technologies is sponsoring a meet up at the October 2012 Enterprise Search Summit. More information is available at http://www.meetup.com/DC-Metro-Enterprise-Search-Network/

Iain Fletcher, October 12, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Deconstructing HP Autonomy and Its Eight Answers

September 26, 2012

All Things Digital ran a story called “Eight Questions for Hewlett Packard Software Head George Kadifa.” Let me nudge aside any thoughts that the interview and the questions were presented as public relations and marketing. I want to view the comments or “answers” as accurate. Once I have highlighted the points which caught my attention, I want to offer some side observations from my goose pond in rural Kentucky.

First, there were two passages which addressed the $12 billion Autonomy purchase.

The first was information about a recent planning meeting. The Autonomy staff were on deck and ready for duty. The key statement for me was this one:

Basically when you look at Autonomy, the core unit is the IDOL Engine, which is the unique capability of meaning-based computing. We’re going to double down on that. In our labs in Cambridge, England, we have 40 or 50 mathematicians writing algorithms. And we’re going to build a team here in the U.S. to productize it and create a platform around it because it has that potential. Frankly, the way Autonomy was managed previously, they put a lot more emphasis into enabling applications, which was fine, but our belief is that there’s a broad agenda, which is creating a platform around meaning-based computing. So we will maintain those apps, but at the same time we’ll open up the capabilities to a broader set of players outside HP.

Makes sense. Pay $12 billion for IDOL. Leverage it.

The second was semi-business school thinking about how to grow Autonomy’s business. Here’s the passage I noted:

In Europe, they tend to make things complex in order to create more value. For example, they saw the IDOL engine as too complex to just give it to people. Instead they thought they should acquire vendors and then create value by enabling applications. Here we take something that’s complex and we ask how we might simplify it in order to give it more scale for a bigger market. So some of that difference was cultural, and some of it was that I think they fell in love with these acquisitions. … We think Autonomy’s technology has broader implications.

I urge you to read the full “eight questions” and the answers. Now my observations:

  1. Productizing IDOL or any search engine can be difficult. When I use the word “difficult,” I mean time consuming, expensive, and timetable free. Buying a search engine and sticking it in a product or service looks easy. It is not. In fact, IBM has elected to use open source search to provide the basics. Now IBM is working hard to make money from its value add system, the game show winner Watson. There may be a product in “there”, but it is often to find a way to make money. HP has to pay back the $12 billion it spent and then grow the Autonomy business which was within shouting distance of $1 billion.
  2. The notion that Europeans see the world differently from HP is interesting. I am not sure how European Autonomy was. My view is that Autonomy’s senior management acquired companies and did upselling. As a result, only Autonomy broke through the glass ceilings behind which Endeca, Exalead, ISYS, and Fast Search & Transfer were trapped. Before applying business school logic to Autonomy, perhaps one should look at how other acquired search vendors have paid off. The list is, based on my research, a short one indeed. Microsoft, for example, has made Fast Search a component of SharePoint. With Fast Search nearing or at its end of life, Microsoft faces more search challenges, not fewer. HP may find itself with facing more challenges than it expects.
  3. The notion of “broader applications” is a popular one. Dassault Systèmes, acquired Exalead, which is arguably better and more recent technology than IDOL. But Dassault’s senior managers continue to look for ways to convert a more modest expenditure for Exalead into a river of revenue. Dassault has a global approach and many excellent managers. Even for such an exceptional firm, search is not what it seemed to be; that is, a broad application which slots into to many customer needs. Reality, based on my research for The New Landscape of Search, is different from the business school map.

HP is making an trip which other companies have taken before. My view is that HP will have to find answers the these questions, which were not part of the interview cited above:

First, how will HP pay off the purchase price, grow Autonomy’s revenue, and generate enough money to have an impact on HP’s net profit? My work has pointed out that cost control is the major problem search vendors face. It takes money to explain a system no matter how productized it becomes. It takes money to support that technology. It takes money to enhance that system. It takes money to hire people who can do the work. In short, search becomes a bright blip on most CFOs’ radar screens. HP may be different, but I am not sure that the cost issue will remain off the radar for very long.

Second, IDOL is a complex collection of software components. The core is Bayesian, but much of the ancillary IDOL are the add ons, enhancements, and features which have been created and applied to base system over the last two decades. Yep, two decades. In search, most of the systems which have figured in big deals in the last two years date from the mid to late 1990s. The more modern systems are not search at all. These new systems leap frog key word search and push into high value opportunities. HP may be forced to buy one of more of these next generation systems just to stay in the “beyond search” game.

Third, HP is a large company and it faces considerable competition in software. What makes HP interesting is that it has not been able to make its services business offset the decline in personal computers and ink. HP now wants to prove that it can make services work, but as the Inquirer pointed out in mid August 2012:

HP’s write-down of EDS might have resulted in just a paper loss – the firm didn’t actually lose $9bn in cash – but it provides an insight into how a decade of mismanagement has left HP in a bad situation. The fact is that HP cannot lay the blame on diminishing PC sales because its enterprise business, printing and services divisions all reported losses, too. For HP to write down the purchase of EDS, a company it paid $13.9bn for just four years ago, strongly suggests that those who were at the helm of HP in the run-up to that acquisition simply had no clue as to how much EDS was really worth and how to incorporate the company into HP. The value of any company can go down over time – just look at AOL, Microsoft or Yahoo – but for an established business such as EDS to be overvalued by almost $10bn just four years after being acquired is nothing short of gross incompetence by HP in both the purchase and the subsequent handling of the firm once it became a part of HP.

I don’t fully agree with the Inquirer’s viewpoint. But one fact remains: HP must demonstrate that it can manage a complex business based on IDOL, a technology which is not a spring chicken. The man who did manage Autonomy to almost $1 billion in sales is not longer with HP. In the history of enterprise search and content processing, Mike Lynch was unique. Perhaps the loss of that talent will continue to impact HP’s plans for a different approach to the market for Autonomy’s technology?

Life extension treatments are available, but these often do not work as expected and can be expensive. Most fail in the end.

Stephen E Arnold, September 25, 2012

Sponsored by Augmentext

Search: A Persistent Disconnect between Reality and Innovation

August 17, 2012

Two years ago I wrote The New Landscape of Search. Originally published by Pandia in Norway, the book is now available without charge when you sign up for  our new “no holds barred” search newsletter Honk!. In the discussion of Microsoft’s acquisition of Fast Search & Transfer SA in 2008, I cite documents which describe the version of Fast Search which the company hoped to release in 2009 or 2010. After the deal closed, the new version of Fast seemed to drop from view. What became available was “old” Fast.

I read the InfoWorld story “Bring Better Search to SharePoint.” Set aside the PR-iness of the write up. The main point is that SharePoint has a lousy search system. Think of the $1.2 billion Microsoft paid for what seems to be, according to the write up, a mongrel dog. My analysis of Fast Search focused on its age. The code dates from the late 1990s and its use of proprietary, third party, and open source components. Complexity and the 32 bit architecture were in need of attention beyond refactoring.

The InfoWorld passage which caught my attention was:

Longitude Search’s AptivRank technology monitors users as they search, then promotes or demotes content’s relevance rankings based on the actions the user takes with that content. In a nutshell, it takes Microsoft’s search-ranking algorithm and makes it more intelligent…

The solution to SharePoint’s woes amounts to tweaking. In my experience, there are many vendors offering similar functionality and almost identical claims regarding fixing up SharePoint. You can chase down more at www.arnoldit.com/overflight.

The efforts are focused on a product with a large market footprint. In today’s dicey economic casino, it makes sense to trumpet solutions to long standing information retrieval challenges in a product like SharePoint. Heck, if I had to pick a market to pump up my revenue, SharePoint is a better bet than some others.

Contrast the InfoWorld’s “overcome SharePoint weaknesses” with the search assertions in “Search Technology That Can Gauge Opinion and Predict the Future.” We are jumping from the reality of a Microsoft product which has an allegedly flawed search system into the exciting world of what everyone really, really wants—serious magic. Fixing SharePoint is pretty much hobby store magic. Predicting the future: That is big time, hide the Statue of Liberty magic.

Here’s the passage which caught my attention:

A team of EU-funded researchers have developed a new kind of internet search that takes into account factors such as opinion, bias, context, time and location. The new technology, which could soon be in use commercially, can display trends in public opinion about a topic, company or person over time — and it can even be used to predict the future…Future Predictor application is able to make searches based on questions such as ‘What will oil prices be in 2050?’ or ‘How much will global temperatures rise over the next 100 years?’ and find relevant information and forecasts from today’s web. For example, a search for the year 2034 turns up ‘space travel’ as the most relevant topic indexed in today’s news.

Yep, rich indexing, facets, and understanding text are in use.

What these two examples make clear, in my opinion, is that:

Search is broken. If an established product delivers inadequate findability, why hasn’t Microsoft just solved the problem? If off the shelf solutions are available from numerous vendors, why hasn’t Microsoft bought the ones which fix up SharePoint and call it a day? The answer is that none of the existing solutions deliver what users want. Sure, search gets a little better, but the SharePoint search problem has been around for a decade and if search were such an easy problem to solve, Microsoft has the money to do the job. Still a problem? Well, that’s a clue that search is a tough nut to crack in my book. Marketers don’t have to make a system meet user needs. Columnists don’t even have to use the systems about which they write. Pity the users.

Writing about whiz bang new systems funded by government agencies is more fun than figuring out how to get these systems to work in the real world. If SharePoint search does not work, what effort and investment will be required to predict the future via a search query? I am not holding my breath, but the pundits can zoom forward.

The search and retrieval sector is in turmoil, and it will stay that way. The big news in search is that free and open source options are available which work as well as Autonomy- and Endeca-like systems. The proprietary and science fiction solutions illustrate on one hand the problems basic search has in meeting user needs and, on the other hand,  the lengths to which researchers are trying to go to convince their funding sources and regular people that search is going to get better real soon now.

Net net: Search is a problem and it is going to stay that way. Quick fixes, big data, and predictive whatevers are not going to perform serious magic quickly, economically, or reliably without significant investment. InfoWorld seems to see chipper descriptions and assertions as evidence of better search. The Science Daily write up mingles sci-fi excitement with a government funded program to point the way to the future.

Sorry. Search is tough and will remain a chunk of elk hide until the next round of magic is spooned by public relations professionals into the coffee mugs of the mavens and real journalists.

Stephen E Arnold, August 17, 2012

Sponsored by Augmentext

 

IBM Big Data Initiative Coming in Focus with Cloudera, Hadoop Partnerships

May 17, 2012

Big data management and analytics is becoming a key basis of competition as organizations look to turn their complex and large data sets into business assets. In “Analyst Commentary: IBM Adds Search and Broadens Hadoop Strategy with Big Data,” Stuart Lauchlan comments on IBM’s Vivisimo acquisition. Lauchlan says that the acquisition puts to rest the ambiguity of IBM’s Hadoop partnership strategy. He also has this to add about handling big data:

By definition, one of the major problems in discovering the information “nuggets” in Big Data environments is that the volume of data is large and consequently difficult to traverse or search using traditional enterprise search and retrieval (ESR) tools that require the creation and maintenance of indexes before a query can be made. Vivisimo’s offering indexes and clusters results in real time, and its scalability enables dynamic navigation across results delivered, as well as the automation of discovery, reducing the burden/time of analysis.

Even though the actual value of the acquisition has not been declared, we do know that IBM has spent $14 billion in the last seven years on analytics-related products and companies. And while IBM has already acquired a service like Vivisimo, it seems that IBM saw value in the search software’s new capabilities, such as federated discovery and navigation.  IBM is no doubt trying to take SharePoint, the major player in enterprise.

Lauchlan’s article is a comprehensive overview of the IBM strategy. It may be a worthy read to keep in the loop on enterprise search news. But while IBM seeks to develop a comprehensive search solution with big acquisitions, organizations can turn to expert third party solutions to also get the power of efficient and federated search now.

The search experts at Fabasoft Mindbreeze offer a cost-effective suite of solutions to tame big data sprawl and connect your users to the right information at the right time. And with Folio connectors, organizations can access on-premise and cloud data with one easy search. Here you can read about the enterprise search solution:

The data often lies distributed across numerous sources. Fabasoft Mindbreeze Enterprise gains each employee two weeks per through focused finding of data (IDC Studies). An invaluable competitive advantage in business as well as providing employee satisfaction…But an all-inclusive search is not everything. Creating relevant knowledge means processing data in a comprehensible form and utilizing relations between information objects. Data is sorted according to type and relevance. The enterprise search for professionals.

Navigate to http://www.mindbreeze.com/ to learn more.

Philip West, May 17, 2012

Sponsored by Pandia.com

More Allegations about Fast Search Impropriety

March 8, 2012

With legions of Microsoft Certified Resellers singing the praises of the FS4SP (formerly the Fast Search & Transfer search and retrieval system), sour notes are not easily heard. I don’t think many users of FS4SP know or care about the history of the company, its university-infused technology, or the machinations of the company’s senior management and Board of Directors. Ancient history.

I learned quite a bit in my close encounters with the Fast ESP technology. No, ESP does not mean extra sensory perception. ESP allegedly meant the enterprise search platform. Fast Search, before its purchase by Microsoft, was a platform, not a search engine. The idea was that the collection of components would be used to build applications in which search was an enabler. The idea was a good one, but search based applications required more than a PowerPoint to become a reality. The 64 bit Exalead system, developed long before Dassault acquired Exalead, was one of the first next generation, post Google systems to have a shot at delivering a viable search based application. (The race for SBAs, in my opinion, is not yet over, and there are some search vendors like PolySpot which are pushing in interesting new directions.) Fast Search was using marketing to pump up license deals. In fact, the marketing arm was more athletic than the firm’s engineering units. That, in my view, was the “issue” with Fast Search. Talk and demos were good. Implementation was a different platter of herring five ways.

image

Fast Search block diagram circa 2005. The system shows semantic and ontological components, asserts information on demand, and content publishing functions—all in addition to search and retrieval. Similar systems are marketed today, but hybrid content manipulation systems are often a work in progress in 2012. © Fast Search & Transfer

I once ended up with an interesting challenge resulting from a relatively large-scale, high-profile search implementation. Now you may have larger jobs than I typically get, but I was struggling with the shift from Inktomi to the AT&T Fast search system in order to index the public facing content of the US federal government.

Inktomi worked reasonably well, but the US government decided in its infinite wisdom to run a “free and open competition.” The usual suspects responded to the request for proposal and statement of work. I recall that “smarter than everyone else” Google ignored the US government’s requirements.

image

This image is from a presentation by Dr. Lervik about Digital Libraries, no date. The slide highlights the six key functions of the Fast Search search engine. These are extremely sophisticated functions. In 2012, only a few vendors can implement a single system with these operations running in the core platform. In fact, the wording could be used by search vendor marketers today. Fast Search knew where search was heading, but the future still has not arrived because writing about a function is different from delivering that function in a time and resource window which licensees can accommodate. © Fast Search & Transfer

Fast Search, with the guidance of savvy AT&T capture professionals, snagged the contract. That was a fateful procurement. Fast Search yielded to a team from Vivisimo and Microsoft. Then Microsoft bought Fast Search, and the US government began its shift to open source search. Another consequence is that Google, as you may know, never caught on in the US Federal government in the manner that I and others assumed the company would. I often wonder what would have happened if Google’s capture team had responded to the statement of work instead of pointing out that the requirements were not interesting.

Read more

Discover Point: Search Shows the Unseen

February 16, 2012

Discover Point comes at search and retrieval with the “automatically connect people with highly relevant information.” I find this interesting because it makes search into a collaborative type of solution. Different from a search enable application, Discover Point pops up a conceptual level. After all, who wants another app. When I need information, I usually end up talking to an informed individual.

Government Computer News reported on this approach in the write up “An Info and Expertise Concierge for the Office.” GCN perceives Discover Point as having a solution for the US government which “prevents agencies from constantly reinventing the wheel and instead helps users move forward with new tasks and projects…” This is an interesting marketing angle because it shifts from assertions that few understand such as semantics, ontologies, and facets.

GCN continues:

DiscoverPoint from Discover Technologies is designed to point users in the direction of the most relevant information and subject-matter experts within the shared platform environment. As your job focus changes, so do the searches that DiscoverPoint makes….But the really cool things start happening after you’ve been using the system for a while. As more personnel and documents relevant to what you are doing become available on the system, they will show up on your discovery page.

The idea of having a system “discover” information appeals to the GCN professionals giving Discover Point a test drive.

Discover Point is compatible with SharePoint, Microsoft’s ubiquitous content management, collaboration, search, and kitchen sink solution. Discover Point’s news release emphasizes that the firm’s approach in unique. See “Discover Point Software Selected Product of the Month by Government Computer News.” The Discover Point Web site picks up this theme:

Discover Technologies’ approach is truly unique, in that we do not require the manual creation of databases or MySites or other repositories to understand the needs of each and every user. We continuously analyze the content they dwell in, and establish an understanding of the users’ interests based on that content. Once this user understanding is gained, and this happens very quickly, then the proactive delivery of information and ‘people’ is enabled and the cost savings and quality benefits are realized.

Unique is a strong word. The word suggests to me something which is the only one of its kind or without an equal or an equivalent. There are many SharePoint search, retrieval, and discovery solutions in the market at this time. The president’s letter tells me:

‘Discover’ is able to understand what your users need, in terms of both information and ‘experts’ with whom they should be collaborating. This understanding is gained via our patent pending algorithms, which are able to examine user related content and ‘understand’ the subject matter being addressed, and therefore the subject matter that each and every one of your employees is focused on. Once this takes place, our products can deliver both info and people to your users, personalized to match their individual needs. The bottom line is that you need your experts, your most highly paid and critical personnel, to minimize the amount of time they spend doing administrative or manual activities and to maximize the time spent tackling the key problems that they are uniquely qualified to address. That is what DiscoverPoint does for you, and it pays for itself in very short order!

The company offers an Extensible Search Framework and an Advanced Connector Engine. The company also performs customer UIS (an acronym with which I am unfamiliar). The firm also has a software integration business, performs “high performance data indexing”, and offers professional services.

The company has an interesting marketing message. I noticed that Google’s result page includes a reference to IDOL, Autonomy’s system. We will monitor the firm’s trajectory because it looks like a hybrid which combines original software, a framework, consulting, and services. Maybe Forrester, Gartner, and Ovum will emulate Discover Technologies’ Swiss Army knife approach to findability and revenue generation?

Stephen E Arnold, February 16, 2012

Sponsored by Pandia.com

Exogenous Complexity 1: Search

January 31, 2012

I am now using the phrase “exogenous complexity” to describe systems, methods, processes, and procedures which are likely to fail due to outside factors. This initial post focuses on indexing, but I will extend the concept to other content centric applications in the future. Disagree with me? Use the comments section of this blog, please.

What is an outside factor?

Let’s think about value adding indexing, content enrichment, or metatagging. The idea is that unstructured text contains entities, facts, bound phrases, and other identifiable entities. A key word search system is mostly blind to the meaning of a number in the form nnn nn nnnn, which in the United States is the pattern for a Social Security Number. There are similar patterns in Federal Express, financial, and other types of sequences. The idea is that a system will recognize these strings and tag them appropriately; for example:

nnn nn nnn Social Security Number

Thus, a query for Social Security Numbers will return a string of digits matching the pattern. The same logic can be applied to certain entities and with the help of a knowledge base, Bayesian numerical recipes, and other techniques such as synonym expansion determine that a query for Obama residence will return White House or a query for the White House will return links to the Obama residence.

One wishes that value added indexing systems were as predictable as a kabuki drama. What vendors of next generation content processing systems participate in is a kabuki which leads to failure two thirds of the time. A tragedy? It depends on whom one asks.

The problem is that companies offering automated solutions to value adding indexing, content enrichment, or metatagging are likely to fail for three reasons:

First, there is the issue of humans who use language in unexpected or what some poets call “fresh” or “metaphoric” methods. English is synthetic in that any string of sounds can be used in quite unexpected ways. Whether it is the use of the name of the fruit “mango” as a code name for software or whether it is the conversion of a noun like information into a verb like informationize which appears in Japanese government English language documents, the automated system may miss the boat. When the boat is missed, continued iterations try to arrive at the correct linkage, but anyone who has used fully automated systems know or who paid attention in math class, the recovery from an initial error can be time consuming and sometimes difficult. Therefore, an automated system—no matter how clever—may find itself fooled by the stream of content flowing through its content processing work flow. The user pays the price because false drops mean more work and suggestions which are not just off the mark, the suggestions are difficult for a human to figure out. You can get the inside dope on why poor suggestions are an issue in Thining, Fast and Slow.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta