The Microsoft Yahoo Fiasco: Impact on SharePoint and Web Search

May 5, 2008

You can’t look at a Web log with out dozens of postings about Microsoft’s adventure with Yahoo. You can grind through the received wisdom on Techmeme River, a wonderful as-it-happened service. In this Web log posting, I want to recap some of my views about this remarkable digital charge at a windmill. On this cheery Monday in rural Kentucky, I can see a modern Don Quixote, who looks quite a bit like Steve Ballmer, thundering down another digital hollow.

What’s the impact on SharePoint search?

Zip. Nada. None. SharePoint search is not one thing. Read my essay about MOSS and MSS. They add up to a MESS. I’m still waiting for the well-dressed but enraged Fast Search PR wizard to spear shake a pointed lance at me for that opinion. Fast Search is sufficiently complex and SharePoint sufficiently Microsoftian in its design to make quick movement in the digital swamp all but impossible.

A T Ball player can swing at the ball until he or she gets a hit, ideally for the parents a home run. Microsoft, like the T Ball player in the illustration, will be swinging for an online hit until the ball soars from the park, scoring a home run and the adulation of the team..

Will Fast Search & Transfer get more attention?

Nope. Fast Search is what it is. I have commented on the long slog this acquisition represents elsewhere. An early January 2008 post provides a glimpse of the complexity that is ESP (that’s enterprise search platform, not extrasensory perception). A more recent discussion talks about the “perfect storm” of Norwegian management expertise, Microsoft’s famed product manager institution, and various technical currents, which I posted on April 26, 2008. These posts caused Fast Search’s ever-infallible PR gurus to try and cook the Beyond Search’s goose. The goose, a nasty bird indeed, side-stepped the charging wunderkind and his hatchet.

Will Microsoft use the Fast Search Web indexing system for Live.com search?

Now that’s a good question. But it misses the point of the “perfect storm” analysis. To rip and replace the Live.com search requires some political horse trading within Microsoft and across the research and product units. Fast Search is arguably a better Web indexing system, but it was not invented at Microsoft, and I think that may present a modest hurdle for the Norwegian management wizards.

Read more

Microsoft: Excellence in Action

July 25, 2022

I wanted to print one page of text. I thought a copy of the cute story about the antics of Elon and Sergey might be nice to keep. My hunch is that some of the content might be disappeared or be tough to see through the cloud of legal eagles responding to the  interesting story. Sorry.

Nope.

Why?

Microsoft seems to be unable to update Windows without rendering a simple function. Was I alone in experiencing this demonstration of excellence? Nope. “Microsoft Warns That New Windows Updates May Break Printing.” The article states:

Microsoft said that the temporary fix has now been disabled by this week’s optional preview updates on Windows Server 2019 systems. This change will lead to printing and scanning failures in Windows environments with non-compliant devices.

There you go. Non compliant.

But wait, there’s more.

But wait there’s more!

New Windows 11 Update Breaks the Start Menu Because Microsoft Hates Us All” explains:

It looks like Microsoft has once again shipped dodgy Windows 11 updates, with reports suggesting that the two latest cumulative updates have been causing serious issues with the Start menu. The updates in question are KB5015882 and KB5015814, and it looks like they’ve introduced a bug which causes to Start menu to disappear when you click to open it.

What do these examples suggest to me?

  1. A breakdown in basic quality control. Perhaps the company is involved in addressing layoffs, knock on effects from SolarWinds, and giving speeches about employee issues
  2. Alleged monopolies lack the management tools to deliver products and services which function like the marketing collateral asserts
  3. Employees follow misguided rules; for example, the Wall Street Journal’s assertion that employees should “ditch office chores that don’t help you get ahead.” See Page A 11, July 25, 2022. (If an employee is not as informed as a project lead or manager, how can the uninformed make a judgment about what is and what is not significant? This line of wacko reasoning allows companies with IBM type thinking to provide quantum safe algorithms BEFORE there are quantum computers which can break known encryption keys. Yep, the US government buys into this type of “logic” as well. Hello, NIST? Are you there.

Plus, Microsoft Teams, which is not exactly the most stable software on my Mac Mini, is going to get more exciting features. “Microsoft Is Launching a Facebook Rip-Off Inside Teams.” This article reports:

Microsoft is now launching Viva Engage today, a new Facebook-like app inside Teams that encourages social networking at work. Viva Engage builds on some of the strengths of Yammer, promoting digital communities, conversations, and self-expression in the workplace. While Yammer often feels like an extension of SharePoint and Office, Viva Engage looks like a Facebook replica. It includes a storylines section, which is effectively your Facebook news feed, featuring conversational posts, videos, images, and more. It looks and feels just like Facebook, and it’s clearly designed to feel similar so employees will use it to share news or even personal interests.

That’s exactly what I don’t want when “working.” The idea for me is to get a project, finish it, and move on to another project. Sound like kindergarten? Well, I listened to Mrs. Fenton. Perhaps some did not heed basic tips about generating useful outputs. Yeah, Teams with features added when the service does not do the job on some Macs. Great work from the Windows Phone and Surface units’ employer.

Net net: Problems? Yes. Fixable? I have yet to see proof that Microsoft can remediate its numerous technical potholes. Remember that Microsoft asserted that Russia organized 1,000 programmers to make Microsoft’s security issues more severe. In my view, Russia has demonstrated its inability to organize tanks, let alone complex coordinated software exploits. Come on, Microsoft.

Printers!

Stephen E Arnold, July 25, 2022

Microsoft: Stunned by Its Own Insecure Petard?

March 12, 2021

I read “10 Key Microsoft Ignite Takeaways for CIOs.” Marketing fluff except for one wild and crazy statement. Here’s the passage I found amusing:

By midyear, enterprises will also be able to control in which datacenter Microsoft stores documents shared through Teams, group by group or even for individual users, making it more useful in some regulated industries or where there are concerns about the security of data. These controls will mirror those available for Exchange and SharePoint. There will also be an option to make end-to-end-encrypted one-to-one voice or video calls, that CIOs can enable on a per-employee basis, and to limit meeting attendance only to invited participants. A future update could see the addition of end-to-end encrypted meetings, too. For companies that are centralizing their investment in such collaboration, McQuire said, “Security is arguably the number one selection criterion.”

Assume this number one selection criterion is on the money. What’s the Microsoft security posture with SolarWinds and the Exchange breaches?

That petard packs quite a wallop, and it is not from marketing hoohah. There’s nothing like a marketing oriented conference to blow smoke to obfuscate the incredible security issues Microsoft has created. But conferences and marketing talk are easier than remediating the security problems.

Stephen E Arnold, March 12, 2021

BA Insight: Interesting Spin for Enterprise Search

March 4, 2020

DarkCyber noted BA Insight’s blog post “Make Federation A Part Of Your Single Pane Of Glass.” What’s interesting in the write up are the assertions about enterprise search. Note that the BA Insight Web site include search along with a number of other terms, including “knowledge,” “seekers,” “connectors”, “smart hub”, and “auto classification.”

Let’s look at the assertions which attracted DarkCyber’s attention.

  1. “Many have considered enterprise search to be too complex.” Interesting but a number of companies have failed because what people want a search system to deliver is inherently tricky. The Google Search Appliance was “easier” to implement than a local install of Entopia, for example, but the GSA failed because meeting information needs is difficult in many cases.
  2. Users want a “single pane of glass.” Plus “This improved unified view will dramatically improve the search experience.” The problem remains is that information is not equal. Lawyers have to guard litigation information. Drug researchers have to keep pharma research under wraps. Human resources, what some millennials call “people” jobs have to guard employee health data, salary information, data related to hiring distributions. The “single pane of glass” is an interesting assertion, but federation is more difficult to achieve than some believe… until the services and consulting fees are tallied.
  3. “And, you go live quickly, instantly adding value (you don’t wait six months for crawling to complete).” The speed with which a customer can go live depends upon a number of factors; for example, dealing with security levels, processing content so that it is “findable” by a user, and latencies which creep into distributed systems. Instantly is an appealing term like new. But instantly?

Several observations:

  1. BA Insight is a vendor of search and retrieval services for organizations. The company has worked very hard to explain that search is more than search.
  2. The benefits of the BA Insight approach reads like a checklist of the types of problems which once plagued most enterprise search vendors from Autonomy and Verity. Unfortunately many of these challenges remain today.
  3. BA Insight has moved from its SharePoint centric approach to a wider range of platforms. T

The marketing is interesting. Data backing the assertions would be helpful.

Stephen E Arnold, March 4, 2020

Microsoft and Data Practices: No Backups as a Little Aerial Burst Burns Backup Floppies

July 4, 2019

I read “Microsoft Restores Deleted Technet and MSDN Blogs.” The title is incorrect. DarkCyber suggests “Microsoft Cannot Restore Deleted Blogs Because Backup Practices Fail.” I rarely pay attention to old Microsoft anything. Sure, we noticed that a desktop computer reported that the registration code was no longer valid. We plugged in another legal code and forgot about Microsoft’s odd ineptness with any type of data management. Hey, where are my digital books?

The point of this write up deep in the hollows of rural Kentucky is encapsulated in this passage from the write up:

The problem with the above delete and restore operation: Apparently there was no backup, but you had to restore it from any backups. There is a risk that parts will be lost or that the structure will not return in its old form.

Ever wonder why backups of SQLServer don’t work? Ever wonder where documents went in SharePoint? What happened to historical data in Bing queries?

If the above statement highlight in red is accurate, the reason is that Microsoft’s data practices leave something to be desired; for example, stringent application of such mantras as 1, 2, 3 backup procedures and software that sort of actually works. Hey, where are those restore points?

In the last few days, Facebook nuked itself. Google undergoes self inflicted carpet bombing consistently. Now Microsoft reveals that a fundamental function has been ignored or simply does not work.

What’s up? Complexity hides problems until the fragility of the super duper structures break down. Of course, if the write up is sour grapes, Microsoft remains just the wonderfulest outfit in the digital world.

Stephen E Arnold, July 4, 2019

Lucidworks: The Future of Search Which Has Already Arrived

August 24, 2017

I am pushing 74, but I am interested in the future of search. The reason is that with each passing day I find it more and more difficult to locate the information I need as my routine research for my books and other work. I was anticipating a juicy read when I requested a copy of “Enterprise Search in 2025.” The “book” is a nine page PDF. After two years of effort and much research, my team and I were able to squeeze the basics of Dark Web investigative techniques into about 200 pages. I assumed that a nine-page book would deliver a high-impact payload comparable to one of the chapters in one of my books like CyberOSINT or Dark Web Notebook.

I was surprised that a nine-page document was described as a “book.” I was quite surprised by the Lucidworks’ description of the future. For me, Lucidworks is describing information access already available to me and most companies from established vendors.

The book’s main idea in my opinion is as understandable as this unlabeled, data-free graphic which introduces the text content assembled by Lucidworks.

image

However, the pamphlet’s text does not make this diagram understandable to me. I noted these points as I worked through the basic argument that client server search is on the downturn. Okay. I think I understand, but the assertion “Solr killed the client-server stars” was interesting. I read this statement and highlighted it:

Other solutions developed, but the Solr ecosystem became the unmatched winner of the search market. Search 1.0 was over and Solr won.

In the world of open source search, Lucene and Solr have gained adherents. Based on the information my team gathered when we were working on an IDC open source search project, the dominant open source search system was Lucene. If our data were accurate when we did the research, Elastic’s Elasticsearch had emerged as the go-to open source search system. The alternatives like Solr and Flaxsearch have their users and supporters, but Elastic, founded by Shay Branon, was a definite step up from his earlier search service called Compass.

In the span of two and a half years, Elastic had garnered more than a $100 million in funding by 2014and expanded into a number adjacent information access market sectors. Reports I have received from those attending Elastic meetings was that Elastic was putting considerable pressure on proprietary search systems and a bit of a squeeze on Lucidworks. Google’s withdrawing its odd duck Google Search Appliance may have been, in small part, due to the rise of Elasticsearch and the changes made by organizations trying to figure out how to make sense of the digital information to which their staff had access.

But enough about the Lucene-Solr and open source versus proprietary search yin and yang tension.

Read more

MC+A Is Again Independent: Search, Discovery, and Engineering Services

December 7, 2016

Beyond Search learned that MC+A has added a turbo-charger to its impressive search, content  processing, and content management credentials. The company, based in Chicago, earned a  gold star from Google for MC+A’s support and integration services for the now-discontinued Google Search Appliance. After working with the Yippy implementation of Watson Explorer, MC+A retains its search and retrieval capabilities, but expanded its scope. Michael Cizmar, the company’s president told Beyond Search, “Search is incredibly important, but customers require more multi-faceted solutions.” MC+A provides the engineering and technical capabilities to cope with Big Data, disparate content, cloud and mixed-environment platforms, and the type of information processing needed to generate actionable reports. [For more information about Cizmar’s views about search and retrieval, see “An Interview with Michael Cizmar.”

Cizmar added:

We solve organizational problems rooted in the lack of insight and accessibility to data that promotes operational inefficiency. Think of a support rep who has to look through five systems to find an answer for a customer on the phone. We are changing the way these users get to answers by providing them better insights from existing data securely. At a higher level we provide strategy support for executives looking for guidance on organizational change.

image

Alphabet Google’s decision to withdraw the  Google Search Appliance has left more than 60,000 licensees looking for an alternative. Since the début of the GSA in 2002, Google trimmed the product line and did not move the search system to the cloud. Cizmar’s view of the GSA’s 12 year journey reveals that:

The Google Search Appliance was definitely not a failure. The idea that organizations wanted an easy-to-use, reliable Google-style search system was ahead of its time. Current GSA customers need some guidance on planning and recommendations on available options. Our point of view is that it’s not the time to simply swap out one piece of metal for another even if vendors claim “OEM” equivalency. The options available for data processing and search today all provide tremendous capabilities, including cognitive solutions which provide amazing capabilities to assist users beyond the keyword search use case.

Cizmar sees an opportunity to provide GSA customers with guidance on planning and recommendations on available options. MC+A understands the options available for data processing and information access today. The company is deeply involved in solutions which tap “smart software” to deliver actionable information.

Cizmar said:

Keyword search is a commodity at this point, and we helping our customers put search where the user is without breaking an established workflow. Answers, not laundry lists of documents to read, is paramount today. Customers want to solve specific problems; for example, reducing average call time customer support using smart software or adaptive, self service solutions. This is where MC+A’s capabilities deliver value.

MC+A is cloud savvy. The company realized that cloud and hybrid or cloud-on premises solutions were ways to reduce costs and improve system payoff. Cizmar was one of the technologists recognized by Google for innovation in cloud applications of the GSA. MC+A builds on that engineering expertise. Today, MC+A supports Google, Amazon, and other cloud infrastructures.

Cizmar revealed:

Amazon Elastic Cloud Search is probably doing as much business as Google did with the GSA but in a much different way. Many of these cloud-based offerings are generally solving the problem with the deployment complexities that go into standing up Elasticsearch, the open source version of Elastic’s information access system.

MC+A does not offer a one size fits all solution. He said:

The problem still remains of what should go into the cloud, how to get a solution deployed, and how to ensure usability of the cloud-centric system. The cloud offers tremendous capabilities in running and scaling a search cluster. However, with the API consumption model that we have to operate in, getting your data out of other systems into your search clusters remains a challenge. MC+A does not make security an afterthought. Access controls and system integrity have high priority in our solutions.

MC+A takes a business approach to what many engineering firms view as a technical problem. The company’s engineers examine the business use case. Only then does MC+A determine if the cloud is an option. If so, which product or projects capabilities meet the general requirements. After that process, MC+A implements its carefully crafted, standard deployment process.

Cizmar noted:

If you are a customer with all of your data on premises or have a unique edge case, it may not make sense to use a cloud-based system. The search system needs to be near to the content most of the time.

MC+A offers its white-labeled search “Practice in a Box” to former Google partners and other integrators. High-profile specialist vendors like Onix in Ohio are be able to resell our technology backed by the MC+A engineering team.

In 2017, MC+A will roll out a search solution which is, at this time, shrouded in secrecy. This new offering will go “beyond the GSA” and offer expanded information access functionality. To support this new product, MC+A will announce a specialized search practice.

He said:

This international practice will offer depth and breadth in selling and implementing solutions around cognitive search, assist, and analytics with products other than Google throughout the Americas. I see this as beneficial to other Google and non-Google resellers because, it allows other them to utilize our award winning team, our content filters, and a wealth of social proofs on a just in time basis.

For 2017, MC+A offers a range of products and services. Based on the limited information provided by the secrecy-conscious Michael Ciznar, Beyond Search believes that the company will offer implementation and support services for Lucene and Solr, IBM Watson, and Microsoft SharePoint. The SharePoint support will embrace some vendors supplying SharePoint centric like Coveo. Plus, MC+A will continue to offer software to acquire content and perform extract-transform-load functions on premises, in the cloud, or in hybrid configurations.,

MC+A’s approach offers a business-technology approach to information access.

For more information about MC+A, contact sales@mcplusa.com 312-585-6396.

Stephen E Arnold, December 7, 2016

The Equivalent of a Brexit

August 31, 2016

Britain’s historical vote to leave the European Union has set a historical precedent.  What is the precedent however?  Is it the choice to leave an organization?  The choice to maintain their independence?  Or is it a basic example of the right to choose?  The Brexit will be used as a metaphor for any major upheaval for the next century, so how can it be used in technology context?  BA Insight gives us the answer with “Would Your Users Vote ‘Yes’ For Sharexit?”

SharePoint is Microsoft Office’s collaborative content management program.  It can be used to organize projects, build Web sites, store files, and allow team members to communicate.  Office workers also spurn it across the globe over due to its inefficiencies.  To avoid a Sharexit in your organization, the article offers several ways to improve a user’s SharePoint experience.  One of the easiest ways to keep SharePoint is to build an individual user interface that handles little tasks to make a user’s life easier.  Personalizing the individual SharePoint user experience is another method, so the end user does not feel like another cog in the system but rather that SharePoint was designed for them.  Two other suggestions are plain, simple advice: take user feedback and actually use it and make SharePoint the go information center for the organization by putting everything on it.

Perhaps the best advice is making information easy to find on SharePoint:

Documents are over here, discussions over there, people are that way, and then I don’t know who the experts really are.  You can make your Intranet a whole lot smarter, or dare we say “intelligent”, if you take advantage of this information in an integrated fashion, exposing your users to connected, but different, information.  You can connect documents to the person who wrote them, then to that person’s expertise and connected colleagues, enabling search for your hidden experts. The ones that can really be helpful often reduce chances for misinformation, repetition of work, or errors. To do this, expertise location capabilities can combine contributed expertise with stated expertise, allowing for easy searching and expert identification.

Developers love SharePoint because it is easy to manage and to roll out information or software to every user.  End users hate it because it creates more problems than resolving anything.  If developers take the time to listen to what the end users need from their SharePoint experience than can avoid an Sharexit.

Whitney Grace, August 31, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Facebook and Humans: Reality Is Not Marketing

May 16, 2016

I read “Facebook News Selection Is in Hands of Editors Not Algorithms, Documents Show.” The main point of the story is that Facebook uses humans to do work. The idea is that algorithms do not seem to be a big part of picking out what’s important.

The write up comes from a “real” journalism outfit. The article points out:

The boilerplate about its [Facebook’s]  news operations provided to customers by the company suggests that much of its news gathering is determined by machines: “The topics you see are based on a number of factors including engagement, timeliness, Pages you’ve liked and your location,” says a page devoted to the question “How does Facebook determine what topics are trending?”

After reading this, I thought of Google’s poetry created by its artificial intelligence system. Here’s the line which came to mind:

I started to cry. (Source: Quartz)

I vibrate with the annoyance bubbling under the surface of the newspaper article. Imagine. Facebook has great artificial intelligence. Facebook uses smart software. Facebook open sources its systems and methods. The company says it is at the cutting edge of replacing humans with objective procedures.

The article’s belief in baloney is fried and served cold on stale bread. Facebook uses humans. The folks at real journalism outfits may want to work through articles like “Different Loci of Semantic Interference in Picture Naming vs. Word-Picture Matching Tasks” to get a sense of why smart systems go wandering.

So what’s new? Palantir Technologies uses humans to index content. Without that human input, the “smart” software does some useful work, but humans are part of the work flow process.

Other companies use humans too. But the marketing collateral and the fizzy presentations at fancy conferences paint a picture of a world in which cognitive, artificially intelligent, smart systems do the work that subject matter experts used to do. Humans, like indexers and editors, are no longer needed.

Now reality pokes is rose tinted fingertips into the real world.

Let me be clear. One reason I am not happy with the verbiage generated about smart software is one simple fact.

Most of the smart software systems require humans to fiddle at the beginning when a system is set up, while the system operates to deal with exceptions, and after an output is produced to figure out what’s what. In short, smart software is not that smart yet.

There are many reasons but the primary one is that the math and procedures underpinning many of the systems with which I am familiar are immature. Smart software works well when certain caveats are accepted. For example, the vaunted Watson must be trained. Watson, therefore, is not that much different from the training Autonomy baked into its IDOL system in the mid 1990s. Palantir uses humans for one simple reason. Figuring out what’s important to a team under fire with software works much better if the humans with skin in the game provide indexing terms and identify important points like local names for stretches of highway where bombs can be placed without too much hassle. Dig into any of the search and content processing systems and you find expenditures for human work. Companies licensing smart systems which index automatically face significant budget overruns, operational problems because of lousy outputs, and piles of exceptions to either ignore or deal with. The result is that the smoke and mirrors of marketers speaking to people who want a silver bullet are not exactly able to perform like the carefully crafted demonstrations. IBM i2 Analyst’s Notebook requires humans. Fast Search (now an earlobe in SharePoint) requires humans. Coveo’s system requires humans. Attivio’s system requires humans. OpenText’s suite of search and content processing requires humans. Even Maxxcat benefits from informed set up and deployment. Out of the box, dtSearch can index, but one needs to know how to set it up and make it work in a specific Microsoft environment. Every search and content processing system that asserts that it is automatic is spackling flawed wallboard.

For years, I have given a lecture about the essential sameness of search and content processing systems. These systems use the same well known and widely taught mathematical procedures. The great breakthroughs at SRCH2 and similar firms amount to optimization of certain operations. But the whiziest system is pretty much like other systems. As a result, these systems perform in a similar manner. These systems require humans to create term lists, look up tables of aliases for persons of interest, hand craft taxonomies to represent the chunk of reality the system is supposed to know about, and other “libraries” and “knowledgebases.”

The fact that Watson is a source of amusement to me is precisely because the human effort required to make a smart system work is never converted to cost and time statements. People assume Watson won Jeopardy because it was smart. People assume Google knows what ads to present because Google’s software is so darned smart. People assume Facebook mines its data to select news for an individual. Sure, there is automation of certain processes, but humans are needed. Omit the human and you get the crazy Microsoft Tay system which humans taught to be crazier than some US politicians.

For decades I have reminded those who listened to my lectures not to confuse what they see in science fiction films with reality. Progress in smart software is evident. But the progress is very slow, hampered by the computational limits of today’s hardware and infrastructure. Just like real time, the concept is easy to say but quite expensive and difficult to implement in a meaningful way. There’s a reason millisecond access to trading data costs so much that only certain financial operations can afford the bill. Smart software is the same.

How about less outrage from those covering smart software and more critical thinking about what’s required to get a system to produce a useful output? In short, more info and less puffery, more critical thinking and less sawdust. Maybe I imagined it but both the Google and Tesla self driving vehicles have crashed, right? Humans are essential because smart software is not as smart as those who believe in unicorns assume. Demos, like TV game shows, require pre and post production, gentle reader.

What happens when humans are involved? Isn’t bias part of the territory?

Stephen E Arnold, May 16, 2016

Enterprise Search Revisionism: Can One Change What Happened

March 9, 2016

I read “The Search Continues: A History of Search’s Unsatisfactory Progress.” I noted some points which, in my opinion, underscore why enterprise search has been problematic and why the menagerie of experts and marketers have put search and retrieval on the path to enterprise irrelevance. The word that came to mind when I read the article was “revisionism” for the millennials among us.

The write up ignores the fact that enterprise search dates back to the early 1970s. One can argue that IBM’s Storage and Information Retrieval System (STAIRS) was the first significant enterprise search system. The point is that enterprise search as a productized service has a history of over promising and under delivering of more than 40 years.

image.pngEnterprise search with a touch of Stalinist revisionism.

Customers said they wanted to “find” information. What those individuals meant was have access to information that provided the relevant facts, documents, and data needed to deal with a problem.

Because providing on point information was and remains a very, very difficult problem, the vendors interpreted “find” to mean a list of indexed documents that contained the users’ search terms. But there was a problem. Users were not skilled in crafting queries which were essentially computer instructions between words the index actually contained.

After STAIRS came other systems, many other systems which have been documented reasonably well in Bourne and Bellardo-Hahn’s A History of Online information Services 1963-1976. (The period prior to 1970 describes for-fee research centric online systems. STAIRS was among the most well known early enterprise information retrieval system.)  I provided some history in the first three editions of the Enterprise Search Report, published from 2003 to 2007. I have continued to document enterprise search in the Xenky profiles and in this blog.

The history makes painful reading for those who invested in many search and retrieval companies and for the executives who experienced the crushing of their dreams and sometimes career under the buzz saw of reality.

In a nutshell, enterprise search vendors heard what prospects, workers overwhelmed with digital and print information, and unhappy users of those early systems were saying.

The disconnect was that enterprise search vendors parroted back marketing pitches that assured enterprise procurement teams of these functions:

  • Easy to use
  • “All” information instantly available
  • Answers to business questions
  • Faster decision making
  • Access to the organization’s knowledge.

The result was a steady stream of enterprise search product launches. Some of these were funded by US government money like Verity. Sure, the company struggled with the cost of infrastructure the Verity system required. The work arounds were okay as long as the infrastructure could keep pace with the new and changed word-centric documents. Toss in other types of digital information, make the system perform ever faster indexing, and keep the Verity system responding quickly was another kettle of fish.

Research oriented information retrieval experts looked at the Verity type system and concluded, “We can do more. We can use better algorithms. We can use smart software to eliminate some of the costs and indexing delays. We can [ fill in the blank ].

The cycle of describing what an enterprise search system could actually deliver was disconnected from the promises the vendors made. As one moves through the decades from 1973 to the present, the failures of search vendors made it clear that:

  1. Companies and government agencies would buy a system, discover it did not do the job users needed, and buy another system.
  2. New search vendors picked up the methods taught at Cornell, Stanford, and other search-centric research centers and wrap on additional functions like semantics. The core of most modern enterprise search systems is unchanged from what STAIRS implemented.
  3. Search vendors came like Convera, failed, and went away. Some hit revenue ceilings and sold to larger companies looking for a search utility. The acquisitions hit a high water mark with the sale of Autonomy (a 1990s system) to HP for $11 billion.

What about Oracle, as a representative outfit. Oracle database has included search as a core system function since the day Larry Ellison envisioned becoming a big dog in enterprise software. The search language was Oracle’s version of the structured query language. But people found that difficult to use. Oracle purchased Artificial Linguistics in order to make finding information more intuitive. Oracle continued to try to crack the find information problem through the acquisitions of Triple Hop, its in-house Secure Enterprise Search, and some other odds and ends until it bought in rapid succession InQuira (a company formed from the failure of two search vendors), RightNow (technology from a Dutch outfit RightNow acquired), and Endeca. Where is search at Oracle today? Essentially search is a utility and it is available in Oracle applications: customer support, ecommerce, and business intelligence. In short, search has shifted from the “solution” to a component used to get started with an application that allows the user to find the answer to business questions.

I mention the Oracle story because it illustrates the consistent pattern of companies which are actually trying to deliver information that the u9ser of a search system needs to answer a business or technical question.

I don’t want to highlight the inaccuracies of “The Search Continues.” Instead I want to point out the problem buzzwords create when trying to understand why search has consistently been a problem and why today’s most promising solutions may relegate search to a permanent role of necessary evil.

In the write up, the notion of answering questions, analytics, federation (that is, running a single query across multiple collections of content and file types), the cloud, and system performance are the conclusion of the write up.

Wrong.

The use of open source search systems means that good enough is the foundation of many modern systems. Palantir-type outfits, essential an enterprise search vendors describing themselves as “intelligence” providing systems,, uses open source technology in order to reduce costs, shift bug chasing to a community, The good enough core is wrapped with subsystems that deal with the pesky problems of video, audio, data streams from sensors or similar sources. Attivio, formed by professionals who worked at the infamous Fast Search & Transfer company, delivers active intelligence but uses open source to handle the STAIRS-type functions. These companies have figured out that open source search is a good foundation. Available resources can be invested in visualizations, generating reports instead of results lists, and graphical interfaces which involve the user in performing tasks smart software at this time cannot perform.

For a low cost enterprise search system, one can download Lucene, Solr, SphinxSearch, or any one of a number of open source systems. There are low cost (keep in mind that costs of search can be tricky to nail down) appliances from vendors like Maxxcat and Thunderstone. One can make do with the craziness of the search included with Microsoft SharePoint.

For a serious application, enterprises have many choices. Some of these are highly specialized like BAE NetReveal and Palantir Metropolitan. Others are more generic like the Elastic offering. Some are free like the Effective File Search system.

The point is that enterprise search is not what users wanted in the 1970s when IBM pitched the mainframe centric STAIRS system, in the 1980s when Verity pitched its system, in the 1990s when Excalibur (later Convera) sold its system, in the 2000s when Fast Search shifted from Web search to enterprise search and put the company on the road to improper financial behavior, and in the efflorescence of search sell offs (Dassault bought Exalead, IBM bought iPhrase and other search vendors), and Lexmark bought Brainware and ISYS Search Software.

Where are we today?

Users still want on point information. The solutions on offer today are application and use case centric, not the silly one-size-fits-all approach of the period from 2001 to 2011 when Autonomy sold to HP.

Open source search has helped create an opportunity for vendors to deliver information access in interesting ways. There are cloud solutions. There are open source solutions. There are small company solutions. There are more ways to find information than at any other time in the history of search as I know it.

Unfortunately, the same problems remain. These are:

  1. As the volume of digital information goes up, so does the cost of indexing and accessing the sources in the corpus
  2. Multimedia remains a significant challenge for which there is no particularly good solution
  3. Federation of content requires considerable investment in data grooming and normalizing
  4. Multi-lingual corpuses require humans to deal with certain synonyms and entity names
  5. Graphical interfaces still are stupid and need more intelligence behind the icons and links
  6. Visualizations have to be “accurate” because a bad decision can have significant real world consequences
  7. Intelligent systems are creeping forward but crazy Watson-like marketing raises expectations and exacerbates the credibility of enterprise search’s capabilities.

I am okay with history. I am not okay with analyses that ignore some very real and painful lessons. I sure would like some of the experts today to know a bit more about the facts behind the implosions of Convera, Delphis, Entopia, and many other companies.

I also would like investors in search start ups to know a bit more about the risks associated with search and content processing.

In short, for a history of search, one needs more than 900 words mixing up what happened with what is.

Stephen E Arnold, March 9, 2016

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta