CyberOSINT banner

Taxonomy Turmoil: Good Enough May Be Too Much

February 28, 2015

For years, I have posted a public indexing Overflight. You can examine the selected outputs at this Overflight link. (My non public system is more robust, but the public service is a useful temperature gauge for a slice of the content processing sector.)

When it comes to indexing, most vendors provide keyword, concept tagging, and entity extraction. But are these tags spot on? No, most are good enough.

image

A happy quack to Jackson Taylor for this “good enough” cartoon. The salesman makes it clear that good enough is indeed good enough in today’s marketing enabled world.

I chose about 50 companies that asserted their systems performed some type of indexing or taxonomy function. I learned that the taxonomy business is “about to explode.” I find that to be either an interesting investment tip or a statement that is characteristic of content processing optimists.

Like search and retrieval, plugging in “concepts” or other index terms is a utility function. For example, if one indexes each word in an article appearing in this blog, the article might be about another subject. For example, in this post, I am talking about Overflight, but the real topic is the broader use of metadata in information retrieval systems. I could assign the term “faceted navigation” to this article as a way to mark this article as germane to point and click navigation systems.

If you examine the “reports” Overflight outputs for each of the companies, you will discover several interesting things as I did on February 28, 2015 when I assembled this short article.

  1. Mergers or buying failed vendors at fire sale prices are taking places. Examples include Lucidea’s purchase of Cuadra and InMagic. Both of these firms are anchored in traditional indexing methods and seemed to be within a revenue envelope until their sell out. Business Objects acquired Inxight and then SAP acquired Business Objects. Bouvet acquired Ontopia. Teradata acquired Revelytix
  2. Moving indexing into open source. Thomson Reuters acquired ClearForest and made most of the technology available as OpenCalais. OpenText, a rollup outfit, acquired Nstein. SAS acquired Teragram. Smartlogic acquired Schemalogic. (A free report about Schemalogic is available at www.xenky.com/vendor-profiles.)
  3. A number of companies just failed, shut down, or went quiet. These include Active Classification, Arikus, Arity, Forth ICA, MaxThink, Millennium Engineering, Navigo, Progris, Protege, punkt.net, Questans, Quiver, Reuse Company, Sandpiper,
  4. The indexing sector includes a number of companies my non public system monitors; for example, the little known Data Harmony with six figure revenues after decades of selling really hard to traditional publishers. Conclusion: Indexing is a tough business to keep afloat.

There are numerous vendors who assert their systems perform indexing, entity, and metadata extraction. More than 18 of these companies are profiled in CyberOSINT, my new monograph. Oracle owns Triple Hop, RightNow, and Endeca. Each of these acquired companies performs indexing and metadata operations. Even the mashed potatoes search solution from Microsoft includes indexing tools. The proprietary XML data management vendor MarkLogic asserts that it performs indexing operations on content stored in its repository. Conclusion: More cyber oriented firms are likely to capture the juicy deals.

So what’s going on in the world of taxonomies? Several observations strike me as warranted:

First, none of the taxonomy vendors are huge outfits. I suppose one could argue that IBM’s Lucene based system is a billion dollar baby, but that’s marketing peyote, not reality. Perhaps MarkLogic which is struggling toward $100 million in revenue is the largest of this group. But the majority of the companies in the indexing business are small. Think in terms of a few hundred thousand in annual revenue to $10 million with generous accounting assumptions.

What’s clear to me is that indexing, like search, is a utility function. If a good enough search system delivers good enough indexing, then why spend for humans to slog through the content and make human judgments. Why not let Google funded Recorded Future identify entities, assign geo codes, and extract meaningful signals? Why not rely on Haystax or RedOwl or any one of more agile firms to deliver higher value operations.

I would assert that taxonomies and indexing are important to those who desire the accuracy of a human indexed system. This assumes that the humans are subject matter specialists, the humans are not fatigued, and the humans can keep pace with the flow of changed and new content.

The reality is that companies focused on delivering old school solutions to today’s problems are likely to lose contracts to companies that deliver what the customer perceives as a higher value content processing solution.

What can a taxonomy company do to ignite its engines of growth? Based on the research we performed for CyberOSINT, the future belongs to those who embrace automated collection, analysis, and output methods. Users may, if the user so chooses, provide guidance to the system. But the days of yore, when monks with varying degrees of accuracy created catalog sheets for the scriptoria have been washed to the margin of the data stream by today’s content flows.

What’s this mean for the folks who continue to pump money into taxonomy centric companies? Unless the cyber OSINT drum beat is heeded, the failure rate of the Overflight sample is a wake up call.

Buying Apple bonds might be a more prudent financial choice. On the other hand, there is an opportunity for taxonomy executives to become “experts” in content processing.

Stephen E Arnold, February 28, 2015

Enterprise Search: Security Remains a Challenge

February 11, 2015

Download an open source enterprise search system or license a proprietary system. Once the system has been installed, the content crawled, the index built, the interfaces set up, and the system optimized the job is complete, right?

Not quite. Retrofitting a keyword search system to meet today’s security requirements is a complex, time consuming, and expensive task. That’s why “experts” who write about search facets, search as a Big Data system, and search as a business intelligence solution ignore security or reassure their customers that it is no big deal. Security is a big deal, and it is becoming a bigger deal with each passing day.

There are a number of security issues to address. The easiest of these is figuring out how to piggyback on access controls provided by a system like Microsoft SharePoint. Other organizations use different enterprise software. As I said, using access controls already in place and diligently monitored by a skilled security administrator is the easy part.

A number of sticky wickets remain; for example:

  • Some units of the organization may do work for law enforcement or intelligence entities. There may be different requirements. Some are explicit and promulgated by government agencies. Others may be implicit, acknowledged as standard operating procedure by those with the appropriate clearance and the need to know.
  • Specific administrative content must be sequestered. Examples range from information assembled for employee health or compliance requirements for pharma products or controlled substances.
  • Legal units may require that content be contained in a managed system and administrative controls put in place to ensure that no changes are introduced into a content set, access is provided to those with specific credential, or kept “off the radar” as the in house legal team tries to figure out how to respond to a discovery activity.
  • Some research units may be “black”; that is, no one in the company, including most information technology and security professionals are supposed to know where an activity is taking place, what the information of interest to the research team is, and specialized security steps be enforced. These can include dongles, air gaps, and unknown locations and staff.

image

An enterprise search system without NGIA security functions is like a 1960s Chevrolet project car. Buy it ready to rebuild for $4,500 and invest $100,000 or more to make it conform to 2015’s standards.  Source: http://car.mitula.us/impala-project

How do enterprise search systems deal with these access issues? Are not most modern systems positioned to index “all” content? Is the procedures for each of these four examples part of the enterprise search systems’ administrative tool kit?

Based on the research I conducted for CyberOSINT: Next Generation Information Access and my other studies of enterprise search, the answer is, “No.”

Read more

Enterprise Search: Mapless and Lost?

February 5, 2015

One of the content challenges traditional enterprise search trips over is geographic functions. When an employee looks for content, the implicit assumption is that keywords will locate a list of documents in which the information may be located. The user then scans the results list—whether in Google style laundry lists or in the graphic display popularized by Grokker and Kartoo which have gone dark. (Quick aside: Both of these outfits reflect the influence of French information retrieval wizards. I think of these as emulators of Datops “balls” displays.)

grok_150

A results list displayed by the Grokker system. The idea is that the user explores the circular areas. These contain links to content germane to the user’s keyword query.

The Kartoo interface displays sources connected to related sources. Once again the user clicks and goes through the scan, open, read, extract, and analyze process.

In a broad view, both of these visualizations are maps of information. Do today’s users want these type of hard to understand maps?

In CyberOSINT I explore the role of “maps” or more properly geographic intelligence (geoint), geo-tagging, and geographic outputs) from automatically collected and analyzed data.

The idea is that a next generation information access system recognizes geographic data and displays those data in maps. Think in terms of overlays on the eye popping maps available from commercial imagery vendors.

What do these outputs look like? Let me draw one example from the discussion in CyberOSINT about this important approach to enterprise related information. Keep in mind that an NGIA can process any information made available to the systems; for example, enterprise accounting systems or databased content along with text documents.

In response to either a task, a routine update when new information becomes available, or a request generated by a user with a mobile device, the output looks like this on a laptop:

image

Source: ClearTerra, 2014

The approach that ClearTerra offers allows a person looking for information about customers, prospects, or other types of data which carries geo-codes appears on a dynamic map. The map can be displayed on the user’s device; for example a mobile phone. In some implementations, the map is a dynamic PDF file which displays locations of items of interest as the item of interest moves. Think of a person driving a delivery truck or an RFID tagged package.

Read more

Enterprise Search: NGIA Vendors Offer Alternative to the Search Box

February 4, 2015

I have been following the “blast from the past” articles that appear on certain content management oriented blogs and news services. I find the articles about federated search, governance, and knowledge related topics oddly out of step with the more forward looking developments in information access.

I am puzzled because the keyword search sector has been stuck in a rut for many years. The innovations touted in the consulting-jargon of some failed webmasters, terminated in house specialists, and frustrated academics are old, hoary with age, and deeply problematic.

There are some facts that cheerleaders for the solutions of the 1970s, 1980s, and 1990s choose to overlook:

  • Enterprise search typically means a subset of content required by an employee to perform work in today’s fluid and mobile work environment. The mix of employees and part timers translates to serious access control work. Enterprise search vendors “support” an organization’s security systems in the manner of a consulting physician to heart surgery. Inputs but no responsibility are the characteristics.
  • The costs of configuring, testing, and optimizing an old school system are usually higher than the vendor suggests. When the actual costs collide with the budget costs, the customer gets frisky. Fast Search & Transfer’s infamous revenue challenges came about in part because customers refused to pay when the system was not running and working as the marketers suggested it would.
  • Employees cannot locate needed information and don’t like the interfaces. The information is often “in” the system but not in the indexes. And if in the indexes, the users cannot figure out which combination of keywords unlocks what’s needed. The response is, “Who has time for this?” When a satisfaction measure is required somewhere between 55 and 75 percent of the search system’s users don’t like it very much.

Obviously organizations are looking for alternatives. These range from using open source solutions which are good enough. Other organizations put up with Windows’ search tools, which are also good enough. More important software systems like an enterprise resource planning or accounting system come with basis search functions. Again: These are good enough.

The focus of information access has shifted from indexing a limited corpus of content using a traditional solution to a more comprehensive, automated approach. No software is without its weaknesses. But compared to keyword search, there are vendors pointing customers toward a different approach.

Who are these vendors? In this short write up, I want to highlight the type of information about next generation information access vendors in my new monograph, CyberOSINT: Next Generation Information Access.

I want to highlight one vendor profiled in the monograph and mention three other vendors in the NGIA space which are not included in the first edition of the report but for whom I have reports available for a fee.

I want to direct your attention to Knowlesys, an NGIA vendor operating in Hong Kong and the Nanshan District, Shenzhen. On the surface, the company processes Web content. The firm also provides a free download of a scraping software, which is beginning to show its age.

Dig a bit deeper, and Knowlesys provides a range of custom services. These include deploying, maintaining, and operating next generation information access systems for clients. The company’s system can process and make available automatically content from internal, external, and third party providers. Access is available via standard desktop computers and mobile devices:

image

Source: Knowlesys, 2014.

The system handles both structured and unstructured content in English and a number of other languages.

image

The company does not reveal its clients and the firm routinely ignores communications sent via the online “contact us” mail form and faxed letters.

How sophisticated in the Knowlesys system? Compared to the other 20 systems analyzed for the CyberOSINT monograph, my assessment is that the company’s technology is on a part with that of other vendors offering NGIA systems. The plus of the Knowlesys system, if one can obtain a license, is that it will handle Chinese and other ideographic languages as well as the Romance languages. The downside is that for some applications, the company’s location in China may be a consideration.

Read more

Recorded Future: Google and Cyber OSINT

February 2, 2015

I find the complaints about Google’s inability to handle time amusing. On the surface, Google seems to demote, ignore, or just not understand the concept of time. For the vast majority of Google service users, Google is no substitute for the users’ investment of time and effort into dating items. But for the wide, wide Google audience, ads, not time, are more important.

Does Google really get an F in time? The answer is, “Nope.”

In CyberOSINT: Next Generation Information Access I explain that Google’s time sense is well developed and of considerable importance to next generation solutions the company hopes to offer. Why the craw fishing? Well, Apple could just buy Google and make the bitter taste of the Apple Board of Directors’ experience a thing of the past.

Now to temporal matters in the here and now.

CyberOSINT relies on automated collection, analysis, and report generation. In order to make sense of data and information crunched by an NGIA system, time is a really key metatag item. To figure out time, a system has to understand:

  • The date and time stamp
  • Versioning (previous, current, and future document, data items, and fact iterations)
  • Times and dates contained in a structured data table
  • Times and dates embedded in content objects themselves; for example, a reference to “last week” or in some cases, optical character recognition of the data on a surveillance tape image.

For the average query, this type of time detail is overkill. The “time and date” of an event, therefore, requires disambiguation, determination and tagging of specific time types, and then capturing the date and time data with markers for document or data versions.

image

A simplification of Recorded Future’s handling of unstructured data. The system can also handle structured data and a range of other data management content types. Image copyright Recorded Future 2014.

Sounds like a lot of computational and technical work.

In CyberOSINT, I describe Google’s and In-Q-Tel’s investments in Recorded Future, one of the data forward NGIA companies. Recorded Future has wizards who developed the Spotfire system which is now part of the Tibco service. There are Xooglers like Jason Hines. There are assorted wizards from Sweden, countries the most US high school software cannot locate on a map, and assorted veterans of high technology start ups.

An NGIA system delivers actionable information to a human or to another system. Conversely a licensee can build and integrate new solutions on top of the Recorded Future technology. One of the company’s key inventions is numerical recipes that deal effectively with the notion of “time.” Recorded Future uses the name “Tempora” as shorthand for the advanced technology that makes time along with predictive algorithms part of the Recorded Future solution.

Read more

Autonomy: Leading the Push Beyond Enterprise Search

January 30, 2015

In “CyberOSINT: Next Generation Information Access,” I describe Autonomy’s math-first approach to content processing. The reason is that after the veil of secrecy was lifted with regard to the signal processing`methods used for British intelligence tasks, Cambridge University became one of the hot beds for the use of Bayesian, LaPlacian, and Markov methods. These numerical recipes proved to be both important and controversial. Instead of relying on manual methods, humans selected training sets, tuned the thresholds, and then turned the smart software loose. Math is not required to understand what Autonomy packaged for commercial use: Signal processing separated noise in a channel and allowed software to process the important bits. Thank you, Claude Shannon and the good Reverend Bayes.

What did Autonomy receive for this breakthrough? Not much but the company did generate more than $600 million in revenues about 10 years after opening for business. As far as I know, no other content processing vendor has reached this revenue target. Endeca, for the sake of comparison, flat lined at about $130 million in the year that Oracle bought the Guided Navigation outfit for about $1.0 billion.

For one thing the British company BAE (British Aerospace Engineering) licensed the Autonomy system and began to refine its automated collection, analysis, and report systems. So what? The UK became by the late 1990s the de facto leader in automated content activities. Was BAE the only smart outfit in the late 1990s? Nope, there were other outfits who realized the value of the Autonomy approach. Examples range from US government entities to little known outfits like the Wynyard Group.

In the CyberOSINT volume, you can get more detail about why Autonomy was important in the late 1990s, including the name of the university8 professor who encouraged Mike Lynch to make contributions that have had a profound impact on intelligence activities. For color, let me mention an anecdote that is not in the 176 page volume. Please, keep in mind that Autonomy was, like i2 (another Cambridge University spawned outfit) a client prior to my retirement.) IBM owns i2 and i2 is profiled in CyberOSINT in Chapter 5, “CyberOSINT Vendors.” I would point out that more than two thirds of the monograph contains information that is either not widely available or not available via a routine Bing, Google, or Yandex query. For example, Autonomy does not make publicly available a list of its patent documents. These contain specific information about how to think about cyber OSINT and moving beyond keyword search.

Some Color: A Conversation with a Faux Expert

In 2003 I had a conversation with a fellow who was an “expert” in content management, a discipline that is essentially a step child of database technology. I want to mention this person by name, but I will avoid the inevitable letter from his attorney rattling a saber over my head. This person publishes reports, engages in litigation with his partners, kowtows to various faux trade groups, and tries to keep secret his history as a webmaster with some Stone Age skills.

Not surprisingly this canny individual had little good to say about Autonomy. The information I provided about the Lynch technology, its applications, and its importance in next generation search were dismissed with a comment I will not forget, “Autonomy is a pile of crap.”

Okay, that’s an informed opinion for a clueless person pumping baloney about the value of content management as a separate technical field. Yikes.

In terms of enterprise search, Autonomy’s competitors criticized Lynch’s approach. Instead of a keyword search utility that was supposed to “unlock” content, Autonomy delivered a framework. The framework operated in an automated manner and could deliver keyword search, point and click access like the Endeca system, and more sophisticated operations associated with today’s most robust cyber OSINT solutions. Enterprise search remains stuck in the STAIRS III and RECON era. Autonomy was the embodiment of the leap from putting the burden of finding on humans to shifting the load to smart software.

image

A diagram from Autonomy’s patents filed in 2001. What’s interesting is that this patent cites an invention by Dr. Liz Liddy with whom the ArnoldIT team worked in the late 1990s. A number of content experts understood the value of automated methods, but Autonomy was the company able to commercialize and build a business on technology that was not widely known 15 years ago. Some universities did not teach Bayesian and related methods because these were tainted by humans who used judgments to set certain thresholds. See US 6,668,256. There are more than 100 Autonomy patent documents. How many of the experts at IDC, Forrester, Gartner, et al have actually located the documents, downloaded them, and reviewed the systems, methods, and claims? I would suggest a tiny percentage of the “experts.” Patent documents are not what English majors are expected to read.”

That’s important and little appreciated by the mid tier outfits’ experts working for IDC (yo, Dave Schubmehl, are you ramping up to recycle the NGIA angle yet?) Forrester (one of whose search experts told me at a MarkLogic event that new hires for search were told to read the information on my ArnoldIT.com Web site like that was a good thing for me), Gartner Group (the conference and content marketing outfit), Ovum (the UK counterpart to Gartner), and dozens of other outfits who understand search in terms of selling received wisdom, not insight or hands on facts.

Read more

Enterprise Search Lacks NGIA Functions

January 29, 2015

Users Want More Than Hunting through a Rubbish

CyberOSINT: Next Generation Information Access is, according to Ric Manning, the publisher of Stephen E Arnold’s new study, is now available. You can order a copy at the Gumroad online store or via the link on Xenky.com.

cover for ads

One of the key chapters in the 176 page study of information retrieval solution that move beyond search takes you under the hood of an NGIA system. Without reproducing the 10 page chapter and its illustrations, I want to highlight two important aspects of NGIA systems.

When a person requires information under time pressure, traditional systems pose a problem. The time required to figure out which repository to query, craft a query or take a stab at what “facet” (category) may contain the information, scanning the outputs the system displays, opening a document that appears to be related to the query, and then figuring out exactly what item of data is the one required makes traditional search a non starter in many work situations. The bottleneck is the human’s ability to keep track of which digital repository contains what. Many organizations have idiosyncratic terminology, and users in one department may not be familiar with the terminology used in another unit of the organization.

image

Register for the seminar on the Telestrategies’ Web site.

Traditional enterprise search systems trip and skin their knees over the time issue and over the “locate what’s needed issue.” These are problems that have persisted in search box oriented systems since the days of RECON, SDC Orbit, and Dialcom. There is little a manager can do to create more time. Time is a very valuable commodity and it often determines what type of decision is made and how risk laden that decision may be.

There is also little one can do to change how a bright human works with a system that forces a busy individual to perform iterative steps that often amount to guessing the word or phrase to unlock what’s hidden in an index or indexes.

Little wonder that convincing a customer to license a traditional keyword system continue to bedevil vendors.

A second problem is the nature of access. There is news floating around that Facebook has been able to generate more ad growth than Google because Facebook has more mobile users. Whether Facebook or Google dominates social mobile, the key development is “mobile.” Works need information access from devices which have smaller and different form factors from the multi core, 3.5 gigahertz, three screen workstation I am using to write this blog post.

Read more

Enterprise Search Problems: Why NGIA Systems Push Beyond Traditional Information Access Methods

January 29, 2015

Enterprise search has been useful. However, the online access methods have changed. Unfortunately, most enterprise search systems and the enterprise applications based on keyword and category access have lagged behind user needs.

The information highway is littered with the wrecks of enterprise search vendors who promised a solution to findability challenges and failed to deliver. Some of the vendors have been forgotten by today’s keyword and category access vendors. Do you know about the business problems that disappointed licensees and cost investors millions of dollars? Are you familiar with Convera, Delphes, Entopia, Fulcrum Technologies, Hakia, Siderean Software, and many other companies.

cover for ads

A handful of enterprise search vendors dodged implosion by selling out. Artificial Linguistics, Autonomy, Brainware, Endeca, Exalead, Fast Search, InQuira, iPhrase, ISYS Search Software, and Triple Hop were sold. Thus, their investors received their money back and in some cases received a premium. The $11 billion paid for Autonomy dwarfed the billion dollar purchase prices of Endeca and Fast Search and Transfer. But most of the companies able to sell their information retrieval systems sold for much less. IBM acquired Vivisimo for about $20 million and promptly justified the deal by describing Vivisimo’s metasearch system as a Big Data solution. Okay.

Today a number of enterprise search vendors walk a knife edge. A loss of a major account or a misstep that spooks investors can push a company over the financial edge in the blink of an eye. Recently I noticed that Dieselpoint has not updated its Web site for a while. Antidot seems to have faded from the US market. Funnelback has turned down the volume. Hakia went offline.

A few firms generate considerable public relations noise. Attivio, BA Insight, Coveo, and IBM Watson appear to be competing to become the leaders in today’s enterprise search sector. But today’s market is very different from the world of 2003-2004 when I wrote the first of three editions of the 400 page Enterprise Search Report. Each of these companies is asserting that their system provides business intelligence,  customer support, and traditional enterprise search. Will any of these companies be able to match Autonomy’s 2008 revenues of $600 million. I doubt it.

The reason is not the availability of open source search. Elasticsearch, in fact, is arguably better than any of the for fee keyword and concept centric information retrieval systems. The problems of the enterprise search sector are deeper.

Read more

Enterprise Search: A Problem of Relevance to the Users

January 23, 2015

I enjoy email from those who read my for fee columns. I received an interesting comment from Australia about desktop search.

image

In a nutshell, the writer read one of my analyses of software intended for a single user looking for information on his local hard drives. The bigger the hard drives, the greater the likelihood, the user will operate in squirrel mode. The idea is that it is easier to save everything because “you never know.” Right, one doesn’t.

Here’s the passage I found interesting:

My concern is that with the very volatile environment where I saw my last mini OpenVMS environment now virtually consigned to the near-legacy basket and many other viable engines disappearing from Desktop search that there is another look required at the current computing environment.

I referred this person to Gaviri Search, which I use to examine email, and Effective File Search, which is useful for looking in specific directories. These suggestions sidestepped the larger issue:

There is no fast, easy to use, stable, and helpful way to look for information on a couple of terabytes of local storage. The files are a mixed bag: Excels, PowerPoints, image and text embedded PDFs, proprietary file formats like Framemaker, images, music, etc.

Such this problem was in the old days and such this problem is today. I don’t have a quick and easy fix. But these are single user problems, not an enterprise scale problem.

An hour after I read the email about my column, I received one of those frequent LinkedIn updates. The title of the thread to which LinkedIn wished to call my attention was/is: “What would you guess is behind a drop in query activity?”

image

I was enticed by the word “guess.” Most assume that the specialist discussion threads on LinkedIn attract the birds with the brightest plumage, not the YouTube commenter crowd.

I navigated to the provided link which may require that you become a member of LinkedIn and then appeal for admission to the colorful feather discussion for “Enterprise Search Professionals.”

The situation is that a company’s enterprise search engine is not being used by its authorized users. There was a shopping list of ideas for generating traffic to the search system. The reason is that the company spent money, invested human resources, and assumed that a new search system would deliver a benefit that the accountants could quantify.

What was fascinating was the response of the LinkedIn enterprise search professionals. The suggestions for improving the enterprise search engine included:

  • Asking for more information about usage? (Interesting but the operative fact is that traffic is low and evident to the expert initiating the thread.)
  • A thought that the user interface and “global navigation” might be an issue.
  • The idea that an “external factor” was the cause of the traffic drop. (Intriguing because I would include the search for a personal search system described in the email about my desktop search column as an “external factor.” The employee looking for a personal search solution was making lone wolf noises to me.)
  • An former English major’s insight that traffic drops when quality declines. I was hoping for a quote from a guy like Aristotle who said, “Quality is not an act, it is a habit.” The expert referenced “social software.”
  • My tongue in cheek suggestion that the search system required search engine optimization. The question sparked sturm und drang about enterprise search as something different from the crass Web site marketing hoopla.
  • A comment about the need for users to understand the vocabulary required to get information from an index of content and “search friendly” pages. (I am not sure what a search friendly page is, however? Is it what an employee creates, an interface, or a canned, training wheels “report”?)

Let’s step back. The email about desktop search and this collection of statements about lack of usage strike me as different sides of the same information access coin.

Read more

Enterprise Search Lags Behind: Actionable Interfaces, Not Lists, Needed

January 22, 2015

I was reviewing the France24.com item “Paris Attacks: Tracing Shadowy Terrorist Links.” I came across this graphic:

image

Several information-access thoughts crossed my mind.

First, France24 presented information that looks like a simplification of the outputs generated by a system like IBM’s i2. (Note: I was an advisor to i2 before its sale to IBM.) i2 is an NGIA or next generation information access system which dates from the 1990s. The notion that crossed my mind is that this relationship diagram presents information in a more useful way than a list of links. After 30 years, I wondered, “Why haven’t traditional enterprise search systems shifted from lists to more useful information access interfaces?” Many vendors have and the enterprise search vendors that stick to the stone club approach are missing what seems to be a quite obvious approach to information access.

image

A Google results list with one ad, two Wikipedia item, pictures, and redundant dictionary links. Try this query “IBM Mainframe.” Not too helpful unless one is looking for information to use in a high school research paper.

Second, the use of this i2-type diagram, now widely emulated from Fast Search centric outfits like Attivio to high flying venture backed outfits like Palantir permits one click access to relevant information. The idea is that a click on a hot spot—a plus in the diagram—presents additional information. I suppose one could suggest that the approach is just a form of faceting or “Guided Navigation”, which is Endeca’s very own phrase. I think the differences are more substantive. (I discuss these in my new monograph CyberOSINT.)

Third, no time is required to figure out what’s important. i2 and some other NGIA systems present what’s important, identifies key data points, and explains what is known and what is fuzzy. Who wants to scan, click, read, copy, paste, and figure out what is relevant and what is not? I don’t for many of my information needs. The issue of “time spent searching” is an artifact of the era when Boolean reigned supreme. NGIA systems automatically generate indexes that permit alternatives to a high school term paper’s approach to research.

Little wonder why the participants in enterprise search discussion groups gnaw bones that have been chewed for more than 50 years. There is no easy solution to the hurdles that search boxes and lists of results present to many users of online systems.

France24 gets it. When will the search vendors dressed in animal skins and carrying stone tools figure out that the world has changed. Demographics, access devices, and information have moved on.

Most enterprise search vendors deliver systems that could be exhibited in the Smithsonian next to the Daystrom 046 Little Gypsy mainframe and the IBM punch card machine.

Stephen E Arnold, January 22, 2015

Next Page »