IBM Watson: Predicting the Future

July 12, 2017

I enjoy IBM’s visions of the future. One exception: The company’s revenue estimates for the Watson product line is an exception. I read “IBM Declares AI the Key to Making Unstructured Data Useful.” For me, the “facts” in the write up are a bit like a Payday candy bar. Some nuts squished into a squishy core of questionable nutritional value.

I noted this factoid:

80 percent of company data is unstructured, including free-form documents, images, and voice recordings.

I have been interested in the application of the 80-20 rule to certain types of estimates. The problem is that the ‘principle of factor sparsity” gets disconnected from the underlying data. Generalizations are just so darned fun and easy. The problem is that the mathematical rigor necessary to validate the generalization is just too darned much work. The “hey, I’ve got a meeting” or the more common “I need to check my mobile” get in the way of figuring out if the 80-20 statement makes sense.

My admittedly inept encounters with data suggest that the volume of unstructured data is high, higher that the 80 percent in the rule. The problem is that today’s systems struggle to:

Make sense of massive streams of unstructured data from outfits like YouTube, clear text and encrypted text messages, and the information blasted about on social media
Identify the important items of content directly germane to a particular matter
Figure out how to convert content processing into useful elements like named entities and relate those entities to code words and synonyms
Perform cost effective indexing of content streams in near real time.

At this time, systems designed to extract actionable information from relatively small chunks of content are improving. But these systems typically break down when the volume exceeds the budget and computing resources available to those trying to “make sense” of the data in a finite amount of time. This type of problem is difficult due to constraints on the systems. These constraints are financial as in “who has the money available right now to process these streams?” These constraints are problematic when someone asks “what do we do with the data in this dialect from northern Afghanistan?” And there are other questions.

My problem with the IBM approach is that the realities of volume, interrelating structured and semi structured data, and multi lingual content is that these bumps in the information super highway Watson seems to speed along are absorbed by marketing fluffiness.

I loved this passage:

Chatterjee highlighted Macy’s as an example of an IBM customer that’s using the company’s tools to better personalize customers’ shopping experiences using AI. The Macy’s On Call feature lets customers get information about what’s in stock and other key details about the contents of a retail store, without a human sales associate present. It uses Watson’s natural language understanding capabilities to process user queries and provide answers. Right now, that feature is available as part of a pilot in 10 Macy’s stores.

Yep, I bet that Macy’s is going to hit a home run against the fast ball pitching of Jeff Bezos’ Amazon Prime team. Let’s ask Watson. On the other hand, let’s ask Alexa.

Stephen E Arnold, July 12, 2017

Written by Stephen E. Arnold · Filed Under IBM Watson, News | 1 Comment

Can an Algorithm Tame Misinformation Online?

June 23, 2017

UCLA researchers are working on an algorithmic solution to the “fake news” problem, we learn from the article, “Algorithm Reads Millions of Posts on Parenting Sites in Bid to Understand Online Misinformation” at TechRadar. Okay, it’s actually indexing and text analysis, not “reading,” but we get the idea. Reporter Duncan Geere tells us:

There’s a special logic to the flow of posts on a forum or message board, one that’s easy to parse by someone who’s spent a lot of time on them but kinda hard to understand for those who haven’t. Researchers at UCLA are working on teaching computers to understand these structured narratives within chronological posts on the web, in an attempt to get a better grasp of how humans think and communicate online.

Researchers used the hot topic of vaccinations, as discussed on two parenting forums, as their test case. Through an examination of nearly 2 million posts, the algorithm was able to come to accurate conclusions, or “narrative framework.” Geere writes:

While this study was targeted at conversations around vaccination, the researchers say the same principles could be applied to any topic. Down the line, they hope it could allow for false narratives to be identified as they develop and countered by targeted messaging.

The phrase “down the line” is incredibly vague, but the sooner the better, we say (though we wonder exactly what form this “targeted messaging” will take). The original study can be found here at eHealth publisher JMIR Publications.

Cynthia Murrell, June 23, 2017

Written by Stephen E. Arnold · Filed Under algorithms, healthcare, News, Publishing | Comments Off on Can an Algorithm Tame Misinformation Online?

Editorial Controls and Data Governance: A Rose by Any Other Name?

June 16, 2017

I read “Why Interest In “Data Governance” Is Increasing.” The write up uses a number of terms to describe what I view as editorial controls. The idea in my experience is that an organization decides what it okay and not okay with regards to the information it wants to process. The object is to know what content will be processed before the organization kick starts indexing, metadata tagging, or text analysis.

The organization then has to figure out and implement the rules of the game. Questions like “What do we do when entities are not recognized?” and “Who goes through the exceptions file?” must be answered. Rules, procedures, processes, and corrective actions have to be implemented in the work flow. One cannot calculate costs, headcount, or software expenses unless one knows what’s going to happen.

The write up explains that data governance is important. I agree. The write up hooks the notion of editorial controls and editorial process to a number of buzzwords. I don’t think this type of jargon catalog is particularly helpful. Jargon distracts some people from focusing on Job One; that is, putting appropriate controls in place before nuking the budget or creating the type of editorial craziness which Facebook and Google are now trying to contain and manage.

The notion that an organization has to perform “data program management” is fine. But this is nothing more than hooking the editorial rules of the road to the responsibilities of the people who have to set up, oversee, and change the work flow.

Jargon does not help implement editorial controls. Clear thinking and speaking do.

Stephen E Arnold, June 16, 2017

Written by Stephen E. Arnold · Filed Under Management, News, Work flow | 2 Comments

AI Decides to Do the Audio Index Dance

June 14, 2017

Did you ever wonder how search engines could track down the most miniscule information? Their power resides in indices that catalog Web sites, images, and books. Audio content is harder to index because most indices rely on static words and images. However, Audioburst plans to change that says Venture Beat in the article, “How Audioburst Is Using AI To Index Audio Broadcasts And Make Them Easy To Find.”

Who exactly is Audioburst?

Founded in 2015, Audioburst touts itself as a “curation and search site for radio,” delivering the smarts to render talk radio in real time, index it, and make it easily accessible through search engines. It does this through “understanding” the meaning behind audio content and transcribes it using natural language processing (NLP). It can then automatically attach metadata so that search terms entered manually by users will surface relevant audio clips, which it calls “bursts.”

Audioburst recently earned $6.7 million in funding and also announced their new API. The API allows third-party developers to Audioburst’s content library to feature audio-based feeds in their own applications, in-car entertainment systems, and other connected devices. There is a growing demand for audio content as more people digest online information via sound bytes, use vocal searches, and make use of digital assistants.

It is easy to find “printed” information on the Internet, but finding a specific audio file is not. Audioburst hopes to revolutionize how people find and use sound. They should consider a partnership with Bitext because indexing audio could benefit from advanced linguistics. Bitext’s technology would make this application more accurate.

Whitney Grace, June 14, 2017

Written by Stephen E. Arnold · Filed Under AI, Bitext, Digital Assistant, News | Comments Off on AI Decides to Do the Audio Index Dance

Yandex Learns Search Can Be Exciting

June 6, 2017

I am not sure if this Thomson Reuters “real news” story is accurate. I found it amusing. You are on your own with this item, gentle reader.

I read “Investigators Search Ukrainian Offices of Russia’s Yandex.” The main point struck me as:

Ukraine’s State Security Service (SBU) raided the local offices of Russia’s top search site Yandex on Monday in an operation that SBU spokesman Olena Gitlyanska said was part of a treason investigation.

The operative word is treason. Exciting, right?

Yandex has previously said it operates fully in accordance with Ukrainian law. It does not expect sanctions to have a material negative impact on its business.

Let’s assume that the “real news” is accurate. The idea that a Web indexing company is guilty of treason is interesting. I know that in my word with a parent’s group to identify potentially harmful sites for their children, I use Yandex as an example.

Ukrainian officials did not reference Yandex’s more interesting indexing policies. That’s a shame. Treason may be more important to the Ukrainian government that links to certain interesting types of videos.

Treason can have a “material negative impact,” however.

Stephen E Arnold, June 5, 2017

Written by Stephen E. Arnold · Filed Under Legal matters, News, Search | 2 Comments

Antidot: Fluid Topics

June 5, 2017

I find French innovators creative. Over the years I have found the visualizations of DATOPS, the architecture of Exalead, the wonkiness of Kartoo, the intriguing Semio, and the numerous attempts to federate data and work flow like digital librarians and subject matter experts. The Descartes- and Femat-inspired engineers created software and systems which try to trim the pointy end off many information thorns.

I read “Antidot Enables ‘Interactive’ Tech Docs That Are Easier To Publish, More Relevant To Users – and Actually Get Read.” Antidot, for those not familiar with the company, was founded in 1999. Today the company bills itself as a specialist in semantic search and content classification. The search system is named Taruqa, and the classification component is called “Classifier.”

The Fluid Topics product combines a number of content processing functions in a workflow designed to provide authorized users with the right information at the right time.

According to the write up:

Antidot has updated its document delivery platform with new features aimed at making it easier to create user-friendly interactive docs. Docs are created and consumed thanks to a combination of semantic search, content enrichment, automatic content tagging and more.

The phrase “content enrichment” suggests to me that multiple indexing and metadata identification subroutines crunch on text. The idea is that a query can be expanded, tap into entity extraction, and make use of text analytics to identify documents which keyword matching would overlook.

The Fluid Topic angle is that documentation and other types of enterprise information can be indexed and matched to a user’s profile or to a user’s query. The result is that the needed document is findable.

The slicing and dicing of processed content makes it possible for the system to assemble snippets or complete documents into an “interactive document.” The idea is that most workers today are not too thrilled to get a results list and the job of opening, scanning, extracting, and closing links. The Easter egg hunt approach to finding business information is less entertaining than looking at Snapchat images or checking what’s new with pals on Facebook.

The write up states:

Users can read, search, navigate, annotate, create alerts, send feedback to writers, with a rich and intuitive user experience.

I noted this list of benefits fro the Fluid Topics’ approach:

Quick, easy access to the right information at the right time, making searching for technical product knowledge really efficient.
Combine and transform technical content into relevant, useful information by slicing and dicing data from virtually any source to create a unified knowledge hub.
Freedom for any user to tailor documentation and provide useful feedback to writers.
Knowledge of how documentation is actually used.

Applications include:

Casual publishing which means a user can create a “personal” book of content and share them.
Content organization which organizes the often chaotic and scattered source information
Markdown which means formatting information in a consistent way.

Fluid Topics is a hybrid which combines automatic indexing and metadata extraction, search, and publishing.

More information about Fluid Topics is available at a separate Antidot Web site called “Fluid Topics.” The company provides a video which explains how you can transform your world when you tackle search, customer support, and content federation and repurposing. Fluid Topics also performs text analytics for the “age of limitless technical content delivery.”

Hewlett Packard invested significantly in workflow based content management technology. MarkLogic’s XML data management system can be tweaked to perform similar functions. Dozens of other companies offer content workflow solutions. The sector is active, but sales cycles are lengthy. Crafty millennials can make Slack perform some content tricks as well. Those on a tight budget might find that Google’s hit and miss services are good enough for many content operations. For those in love with SharePoint, even that remarkable collection of fragmented services, APIs, and software can deliver good enough solutions.

I think it is worth watching how Antidot’s Fluid Topics performs in what strikes me as a crowded, volatile market for content federation and information workflow.

Stephen E Arnold, June 5, 2017

Written by Stephen E. Arnold · Filed Under Business strategy, Federated search, News, Text processing | Comments Off on Antidot: Fluid Topics

SEO Adapts to Rapidly Changing Algorithms

May 30, 2017

When we ponder the future of search, we consider factors like the rise of “smart” searching—systems that deliver what they know the user wants, instead of what the user wants—and how facial recognition search is progressing. Others look from different angles, though, like the business-oriented Inc., which shares the post, “What is the Future of Search?” Citing SEO expert Baruch Labunski, writer Drew Hendricks looks at how rapid changes to search engines’ ranking algorithms affect search-engine-optimization marketing efforts.

First, companies must realize that it is now essential that their sites play well with mobile devices; Google is making mobile indexing a priority. We learn that the rise of virtual assistants raises the stakes—voice-controlled searches only return the very first search result. (A reason, in my opinion, to use them sparingly for online searches.) The article pays the most attention, though, to addressing local search. Hendricks advises:

By combining the highly specific locational data that’s available from consumers searching on mobile, alongside Google’s already in-progress goal of customizing results by location for all users, positioning your brand to those who are physically near you will become crucial in 2017. …

Our jobs as brand managers and promoters will continue to become more complicated as time passes. The days of search engine algorithms filtering by obvious data points, or being easily manipulated, are over. The new fact of search engine optimization is appealing to your immediate markets – those around you and those who are searching directly for your product.

Listing one’s location(s) on myriad review sites and Google Places and placing the address on the company website are advised. The piece concludes by reassuring marketers that, as long as they make careful choices, they can successfully navigate the rapid changes to Google and other online search engines.

Cynthia Murrell, May 30, 2017

Written by Stephen E. Arnold · Filed Under algorithms, Google, Marketing, News, search engine | Comments Off on SEO Adapts to Rapidly Changing Algorithms

AI Not to Replace Lawyers, Not Yet

May 9, 2017

Robot or AI lawyers may be effective in locating relevant cases for references, but they are far away from replacing lawyers, who still need to go to the court and represent a client.

ReadWrite in a recently published analytical article titled Look at All the Amazing Things AI Can (and Can’t yet) Do for Lawyers says:

Even if AI can scan documents and predict which ones will be relevant to a legal case, other tasks such as actually advising a client or appearing in court cannot currently be performed by computers.

The author further explains that what the present generation of AI tools or robots does. They merely find relevant cases based on indexing and keywords, which was a time-consuming and cumbersome process. Thus, what robots do is eliminate the tedious work that was performed by interns or lower level employees. Lawyers still need to collect evidence, prepare the case and argue in the court to win a case. The robots are coming, but only for doing lower level jobs and not to snatch them.

Vishol Ingole, May 9, 2017

Written by Stephen E. Arnold · Filed Under AI, Analytics, Indexing, Legal matters, News | Comments Off on AI Not to Replace Lawyers, Not Yet

Image Search: Biased by Language. The Fix? Use Humans!

April 19, 2017

Houston, we (male, female, uncertain) have a problem. Bias is baked into some image analysis and just about every other type of smart software.

The culprit?

Numerical recipes.

The first step in solving a problem is to acknowledge that a problem exists. The second step is more difficult.

I read “The Reason Why Most of the Images That Show Up When You Search for Doctor Are White Men.” The headline identifies the problem. However, what does one do about biases rooted in human utterance.

My initial thought was to eliminate human utterances. No fancy dancing required. Just let algorithms do what algorithms do. I realized that although this approach has a certain logical completeness, implementation may meet with a bit of resistance.

What does the write up have to say about the problem? (Remember. The fix is going to be tricky.)

I learned:

Research from Princeton University suggests that these biases, like associating men with doctors and women with nurses, come from the language taught to the algorithm. As some data scientists say, “garbage in, garbage out”: Without good data, the algorithm isn’t going to make good decisions.

Okay, right coast thinking. I feel more comfortable.

What does the write up present as wizard Aylin Caliskan’s view of the problem? A post doctoral researcher seems to be a solid choice for a source. I assume the wizard is a human, so perhaps he, she, it is biased? Hmmm.

I highlighted in true blue several passages from the write up / interview with he, she, it. Let’s look at three statements, shall we?

Regarding genderless languages like Turkish:

when you directly translate, and “nurse” is “she,” that’s not accurate. It should be “he or she or it” is a nurse. We see that it’s making a biased decision—it’s a very simple example of machine translation, but given that these models are incorporated on the web or any application that makes use of textual data, it’s the foundation of most of these applications. If you search for “doctor” and look at the images, you’ll see that most of them are male. You won’t see an equal male and female distribution.

If accurate, this observation means that the “fix” is going to be difficult. Moving from a language without gender identification to a language with gender identification requires changing the target language. Easy for software. Tougher for a human. If the language and its associations are anchored in the brain of a target language speaker, change may be, how shall I say it, a trifle difficult. My fix looks pretty good at this point.

And what about images and videos? I learned:

Yes, anything that text touches. Images and videos are labeled to they can be used on the web. The labels are in text, and it has been shown that those labels have been biased.

And the fix is a human doing the content selection, indexing, and dictionary tweaking. Not so fast. The cost of indexing with humans is very expensive. Don’t believe me. Download 10,000 Wikipedia articles and hire some folks to index them from the controlled term list humans set up. Let me know if you can hit $17 per indexed article. My hunch is that you will exceed this target by several orders of magnitude. (Want to know where the number comes from? Contact me and we discuss a for fee deal for this high value information.)

How does the write up solve the problem? Here’s the capper:

…you cannot directly remove the bias from the dataset or model because it’s giving a very accurate representation of the world, and that’s why we need a specialist to deal with this at the application level.

Notice that my solution is to eliminate humans entirely. Why? The pipe dream of humans doing indexing won’t fly due to [a] time, [b] cost, [c] the massive flows of data to index. Forget the mother of all bombs.

Think about the mother of all indexing backlogs. The gap would make the Modern Language Association’s “gaps” look like weekend catch up party. Is this a job for the operating system for machine intelligence?

Stephen E Arnold, April 17, 2017

Written by Stephen E. Arnold · Filed Under AI, News, Text analytics, Text processing | Comments Off on Image Search: Biased by Language. The Fix? Use Humans!

You Do Not Search. You Insight.

April 12, 2017

I am delighted, thrilled. I read “Coveo, Microsoft, Sinequa Lead Insight Engine Market.” What a transformation is captured in what looks to me like a content marketing write up. Key word search morphs into “insight.” For folks who do not follow the history of enterprise search with the fanaticism of those involved in baseball statistics, the use of the word “insight” to describe locating a document is irrelevant. Do you search or insight?

For me, hunkered down in rural Kentucky, with my monitors flickering in the intellectual darkness of Kentucky, the use of the word “insight” is a linguistic singularity. Maybe not on the scale of an earthquake in Italy or a banker leaping from his apartment to the Manhattan asphalt, but a historical moment nevertheless.

Let me recap some of my perceptions of the three companies mentioned in the headline to this tsunami of jargon in the Datanami story:

Coveo is a company which developed a search and retrieval system focused on Windows. With some marketing magic, the company explained keyword search as customer support, then Big data, and now this new thing, “insight”. For those who track vendor history, the roots of Coveo reach back to a consumer interface which was designed to make search easy. Remember Copernic. Yep, Coveo has been around a long while.
Sinequa also was a search vendor. Like Exalead and Polyspot and other French search vendors, the company wanted manage data, provide federation, and enable workflows. After a president change and some executive shuffling, Sinequa emerged as a Big Data outfit with a core competency in analytics. Quite a change. How similar is Sinequa to enterprise search? Pretty similar.
Microsoft. I enjoyed the “saved by the bell” deal in 2008 which delivered the “work in progress” Fast Search & Transfer enterprise search system to Redmond. Fast Search was one of the first search vendors to combine fast-flying jargon with a bit of sales magic. Despite the financial meltdown and an investigation of the Fast Search financials, Microsoft ponied up $1.2 billion and reinvented SharePoint search. Well, not exactly reinvented, but SharePoint is a giant hairball of content management, collaboration, business “intelligence” and, of course, search. Here’s a user friendly chart to help you grasp SharePoint search.

Flash forward to this Datanami article and what do I learn? Here’s a paragraph I noted with a smiley face and an exclamation point:

Among the areas where natural language processing is making inroads is so-called “insight engines” that are projected to account for half of analytic queries by 2019. Indeed, enterprise search is being supplanted by voice and automated voice commands, according to Gartner Inc. The market analyst released it latest “Magic Quadrant” rankings in late March that include a trio of “market leaders” along with a growing list of challengers that includes established vendors moving into the nascent market along with a batch of dedicated startups.

There you go. A trio like ZZTop with number one hits? Hardly. A consulting firm’s “magic” plucks these three companies from a chicken farm and gives each a blue ribbon. Even though we have chickens in our backyard, I cannot tell one from another. Subjectivity, not objectivity, applies to picking good chickens, and it seems to be what New York consulting firms do too.

Are the “scores” for the objective evaluations based on company revenue? No.

Return on investment? No.

Patents? No.

IRR? No. No. No.

Number of flagship customers like Amazon, Apple, and Google type companies? No.

The ranking is based on “vision.” And another key factor is “the ability to execute its “strategy.” There you go. A vision is what I want to help me make my way through Kabul. I need a strategy beyond stay alive.

What would I do if I have to index content in an enterprise? My answer may surprise you. I would take out my check book and license these systems.

With these three systems I would have:

The ability to locate an entity, concept, event, or document
The capability to process content in more than 40 languages, perform subject verb object parsing and entity extraction in near real time
Point-and-click predictive analytics
Point-and-click visualization for financial, business, and military warfighting actions
Numerous programming hooks for integrating other nifty things that I need to achieve an objective such as IBM’s Cybertap capability.

Why is there a logical and factual disconnect between what I would do to deliver real world, high value outputs to my employees and what the New York-Datanami folks recommend?

Well, “disconnect” may not be the right word. Have some search vendors and third party experts embraced the concept of “fake news” or embraced the know how explained in Propaganda, Father Ellul’s important book? Is the idea something along the lines of “we just say anything and people will believe our software will work this way”?

Many vendors stick reasonably close to the factual performance of their software and systems. Let me highlight three examples.

First, Darktrace, a company crafted by Dr. Michael Lynch, is a stickler for explaining what the smart software does. In a recent exchange with Darktrace, I learned that Darktrace’s senior staff bristle when a descriptive write up strays from the actual, verified technical functions of the software system. Anyone who has worked with Dr. Lynch and his senior managers knows that these people can be very persuasive. But when it comes to Darktrace, it is “facts R us”, thank you.

Second, Recorded Future takes a similar hard stand when explaining what the Recorded Future system can and cannot do. Anyone who suggests that Recorded Future predictive analytics can identify the winner of the Kentucky Derby a day before the race will be disabused of that notion by Recorded Future’s engineers. Accuracy is the name of the game at Recorded Future, but accuracy relates to the use of numerical recipes to identify likely events and assign a probability to some events. Even though the company deals with statistical probabilities, adding marketing spice to the predictive system’s capabilities is a no-go zone.

Third, Bitext, the company that offers a Deep Linguistics Analysis platform to improve the performance of a range of artificial intelligence functions, is anchored in facts. On a recent trip to Spain, we interviewed a number of the senior developers at this company and learned that Bitext software works. Furthermore, the professionals are enthusiastic about working for this linguistics-centric outfit because it avoid marketing hyperbole. “Our system works,” said one computational linguist. This person added, “We do magic with computational linguistics and deep linguistic analysis.” I like that—magic. Oh, Bitext does sales too with the likes of Porsche, Volkswagen, and the world’s leading vendor of mobile systems and services, among others. And from Madrid, Spain, no less. And without marketing hyperbole.

Why then are companies based on keyword indexing with a sprinkle of semantics and basic math repositioning themselves by chasing each new spun sugar-encrusted trend?

I have given a tiny bit of thought to this question.

In my monograph “The New Landscape of Search” I made the point that search had become devalued, a free download in open source repositories, and a utility like cat or dir. Most enterprise search systems have failed to deliver results painted in Technicolor in sales presentations and marketing collateral.

Today, if I want search and retrieval, I just use Lucene. In fact, Lucene is more than good enough; it is comparable to most proprietary enterprise search systems. If I need support, I can ring up Elastic or one of many vendors eager to gild the open source lily.

The extreme value and reliability of open source search and retrieval software has, in my opinion, gutted the market for proprietary search and retrieval software. The financial follies of Fast Search & Transfer reminded some investors of the costly failures of Convera, Delphes, Entopia, among others I documented on my Xenky.com site at this link.

Recently most of the news I see on my coal fired computer in Harrod’s Creek about enterprise search has been about repositioning, not innovation. What’s up?

The answer seems to be that the myth cherished by was that enterprise search was the one, true way make sense of digital information. What many organizations learned was that good enough search does the basic blocking and tackling of finding a document but precious little else without massive infusions of time, effort, and resources.

But do enterprise search systems–no matter how many sparkly buzzwords–work? Not too many, no matter what publicly traded consulting firms tell me to believe.

Snake oil? I don’t know. I just know my own experience, and after 45 years of trying to make digital information findable, I avoid fast talkers with covered wagons adorned with slogans.

What happens when an enterprise search system is fed videos, podcasts, telephone intercepts, flows of GPS data, and a couple of proprietary file formats?

Answer: Not much.

The search system has to be equipped with extra cost connectors, assorted oddments, and shimware to deal with a recorded webinar and a companion deck of PowerPoint slides used by the corporate speaker.

What happens when the content stream includes email and documents in six, 12, or 24 different languages?

Answer: Mad scrambling until the proud licensee of an enterprise search system can locate a vendor able to support multiple language inputs. The real life needs of an enterprise are often different from what the proprietary enterprise search system can deal with.

That’s why I find the repositioning of enterprise search technology a bit like a clown with a sad face. The clown is no longer funny. The unconvincing efforts to become something else clash with the sad face, the red nose, and worn shoes still popular in Harrod’s Creek, Kentucky.

Image result for emmett kelly

When it comes to enterprise search, my litmus test is simple: If a system is keyword centric, it isn’t going to work for some of the real world applications I have encountered.

Oh, and don’t believe me, please.

Find a US special operations professional who relies on Palantir Gotham or IBM Analyst’s Notebook to determine a route through a hostile area. Ask whether a keyword search system or Palantir is more useful. Listen carefully to the answer.

No matter what keyword enthusiasts and quasi-slick New York consultants assert, enterprise search systems are not well suited for a great many real world applications. Heck, enterprise search often has trouble displaying documents which match the user’s query.

And why? Sluggish index updating, lousy indexing, wonky metadata, flawed set up, updates that kill a system, or interfaces that baffle users.

Personally I love to browse results lists. I like old fashioned high school type research too. I like to open documents and Easter egg hunt my way to a document that answers my question. But I am in the minority. Most users expect their finding systems to work without the query-read-click-scan-read-scan-read-scan Sisyphus-emulating slog.

Image result for sisyphus

Ah, you are thinking I have offered no court admissible evidence to support my argument, right? Well, just license a proprietary enterprise search system and let me know how your career is progressing. Remember when you look for a new job. You won’t search; you will insight.

Stephen E Arnold, April 12, 2017

Written by Stephen E. Arnold · Filed Under Consulting, Enterprise search, Marketing, News | 3 Comments

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

IBM Watson: Predicting the Future

Can an Algorithm Tame Misinformation Online?

Editorial Controls and Data Governance: A Rose by Any Other Name?

AI Decides to Do the Audio Index Dance

Yandex Learns Search Can Be Exciting

Antidot: Fluid Topics

SEO Adapts to Rapidly Changing Algorithms

AI Not to Replace Lawyers, Not Yet

Image Search: Biased by Language. The Fix? Use Humans!

You Do Not Search. You Insight.

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta