Connotate: Automated Data Extraction That Seems to Work Like a Traditional Alert

November 4, 2014

I found the write up “Using Automated Data Extraction to Find Out Who Makes How Much and Where They Make It” suggestive of what search systems will have to do to survive. The blog post presents information about Connotate’s automation functions.

I learned that Connotate has a client interested in gathering information about salaries. The write up reported:

They’re [the client] trying to scale up and found they could look into salaries and titles only in downtimes, and that wasn’t very often. In fact, they’ve been able to go to only a couple of websites and get information for just two job titles in two countries. But their plans call for learning about hundreds, if not thousands, of job titles across 75 countries. Since they were doing this manually, and only when time permitted, getting to where they needed to be was almost impossible.

The shift to automation as a key feature of information access is important. However, note that the client had a known problem and knew what information was required. Connotate then performed a standing query on accessible content and provided outputs to the client.

However, what about clients who do not know what information is germane to their business? How can automation that mimics knowing what to look for assist with pinpointing unknowns?

Search vendors will have to shift into a different development mode in order to provide services that deal with high volatility and unknowns in today’s business climate.

Stephen Arnold, November 4, 2014

Overreliance on Search Engines?

November 3, 2014

The article titled Google Makes Us All Dumber: The Neuroscience of Search Engines probes the ever-increasing reliance on search engines and finds that the way we use them is problematic. This is due to the way our brains respond to this simplified question and answer process. The article stipulates that the harder we work for knowledge, the more likely we are to store it. When it is as simple as typing in a search query and reading a simple answer, we will forget the answer as easily as we found it. The article explains,

“It’s not that the Internet is making us stupid or incurious. Only we can do that. It’s that we will only realize the potential of technology and humans working together when each is focused on its strengths — and that means we need to consciously cultivate effortful curiosity. Smart machines are taking over more and more of the tasks assumed to be the preserve of humans. But no machine, however sophisticated, can yet be said to be curious.”

We are all guilty of using Google as a shortcut to end a fight over a fact or using IMDB to quickly be reminded of that actor what’s-her-name in that movie whatdya-callit. But the article points out that as we lean more on search engines in this fashion it will only shorten our recalls and diminish our ability to ask interesting questions.

Chelsea Kerwin, November 03, 2014

Sponsored by, developer of Augmentext

Predictive Analytics: An Older Survey with Implications for 2015

November 2, 2014

In my files I had a copy of the 2009 Predictive Analytics World survey about, not surprisingly, predictive analytics. When I first reviewed the data in the report, I noted that “information retrieval” or “search” were not to be found. Before the bandwagon began to roll for predictive analytics, search technology was not in the game if I interpret the survey data correctly.

The factoid I marked was revealed in this table:


The planned use of predictive analytics was for fraud detection.It appears that 64 percent of the sample planned to adopt predictive analytics for criminal or terrorist detection. The method requires filtering various types of public information including text.

Are vendors of enterprise search and content processing systems leaders in this sector in 2014? Based on my research, content processing vendors provide the equivalent of add-in utilities to the popular systems. The hypothesis I have formulated is that traditional information retrieval companies find themselves relegated to a supporting role.

Looking forward to 2015, I see growing dominance by leaders in the cyber OSINT market. Irrelevancy awaits the traditional search vendor unable to identify and then deliver high value solutions to a high demand, high growth market sector.

Has IDC or Dave Schubmehl tracked this sector? I don’t think so. As I produce more information about this market, I anticipate some me-too activity, however.

Stephen E Arnold, November 2, 2014

Whither Bing?

October 31, 2014

I learned in “Microsoft’s Advertising Unit Shuts Down Global Agency, Creative Team in Latest Layoffs” that the latest round of cutbacks strike at Bing ad sales. Other announcements revealed that one of the Microsoft cheerleaders for the search engine optimization crowd has been given an opportunity to find his future elsewhere. (See “Bing Severs Ties with Webmasters by Firing Duane Forrester.”)

What’s up? Well, maybe Microsoft has come to the conclusion that Google is going to get most of the money from online search.

From my vantage point in Harrod’s Creek, the shift makes it clear that what Bing was doing wasn’t making certain folks in the executive suite giddy with joy.

So, let me ask the interesting question, “Has Google claimed another Web search victim?” If that is the case you will want to read my forthcoming article in Information Today about the outlook for Qwant, the French Google killer built on Pertimm technology and singled out by Eric Schmidt as a threat to all things Googley.

I know it is not popular to suggest that the Google is a monopoly, but if Microsoft is not committed to pumping money into Bing, who will challenge the balloon-crazed fighters of death in Mountain View?

How often do you use Jike, iSeek, Ixquick, or Yandex—or Bing?

Stephen E Arnold, October 31, 2014

New Look for Internet Archive

October 29, 2014

The Internet Archive has a new look. You may have seen the change, but I don’t visit the site too frequently. I have struggles with its search system.

The new look features many postage stamp graphics and some text. Click on a graphic and one is sent to the appropriate Archive page.

Here’s a screenshot of the content available to you.


How does one search this content? The search box returns a list of hits with an icon indicating the content type. Have the cheerleaders for unified search would have cracked the information access challenge for a single search box to access mixed content types? I am still a fan of one at a time searching. Inefficient, but I get a sense of the collection’s scope and the idiosyncrasies of the indexed information.

Searching today is more difficult than it was in 1980 in my opinion. The method required is to know what in a collection before one queries it.

How does one know what’s in each of these collections? Well, unfortunately you can no longer ask a librarian in many organizations.

You are on your own, pilgrim.

Stephen E Arnold, October 29, 2014

EasyAsk Adds Glitter to Oya Costumes

October 29, 2014

I learned that Oya Costumes has tapped EasyAsk to provide the search function for You can read the news release here. I clicked around using drop downs and facets. I did run a query to locate a suitable Harrod’s Creek Halloween costume. I searched for Darth Vader. The results were mostly on point. There was one anomaly, an inflatable purple suit. Perhaps Darth has a side few know about.

Here’s the result page for my query:


Here’s a close up of the purple outfit mapped to the query “Darth Vader.”


I quite like the inflatable purple suit. I assume it is semantically related to Mr. Vader.

Stephen E Arnold, October 29, 2014

The Cheating Search Engine

October 29, 2014

Spokeo is a people search engine that when you enter in a name, email address, or username it returns personal information. The results usually include address, phone number, other email addresses, or aliases. People searches are the digital equivalent of a phone book’s white pages, except it is easier to manipulate the information for the desired output. The results, however, are limited based off how much you want to pay for.

Most of the time people search engine advertisements are banal, but Spokeo has tried a new approach. When you visit the Web site, you are greeted by the following caption: “Is He Cheating On You?” This warning follows it:

CAUTION: This information is potentially shocking. Spokeo uses proprietary deep web technology to search over 70 social networks for status updates, photos, relationships, and profiles. Please prepare yourself for the unexpected.”

A picture of a couple caught in a scandalous position flocks the search box. Spokeo is trying to appeal to an entire new clientele, perhaps the kind who click on advertisements about learning a new language in a week or melting away the pounds with a new, exclusive diet pill offer. The search results include dating profiles, social media accounts, aliases, hidden photos, etc. The type of information you

While Spokeo was not the first people search engine of choice, it does provide basic information about individuals. This new advertising campaign, however, pushes it into the lowbrow Internet and makes its content questionable. Why the sudden change in marketing? Is Spokeo seeing a revenue drop or has seen a spike in profits when used as a cheating search engine?

Whitney Grace, October 29, 2014
Sponsored by, developer of Augmentext

Beyond Intranet Search

October 28, 2014

Apparently, there is a difference between search and knowledge management; I guess you learn something new every day. CMS Wire asks, “Intranet Search: Where Documents Go to Die or KM Enabler?” Writer Jed Cawthorne uses Coveo’s platform to illustrate ways a company can go beyond the “baked in” search functionality in an intranet content management system. He writes:

“You don’t need to stick with the ‘built in solution’ if search is important to your KM / Enterprise Information Management strategies. There are alternatives beyond the ever more standard SharePoint (even though building FAST technology into core SharePoint 2013 has improved it) or the really big (and expensive) heavy hitters like HP’s IDOL platform.

“With the growing rate at which our mountains of internal content grow ever bigger, search capabilities are a fundamental element of an intranet, and of the broader digital workplace. If you want to apply long tail principles to mountains of social content, such as discussion forums, news feeds and updates, a search engine with concept search capabilities would be a good idea, unless you have a work force which is truly at one with tagging absolutely everything with appropriate and valuable metadata … (what, you work in the Library of the Jedi Temple? Cool!).”

Cawthorne spoke to Coveo’s Diane Berry about her company’s knowledge management options. She emphasizes broad content-source connectivity, metadata enrichment through text analytics (for companies lacking Jedi librarians), and building taxonomies through entity extraction. A user-interface based on users’ needs is also key, she notes, and mobile interfaces are a part of that. So is making it easy to adjust search and analysis parameters. See the write-up for more details and some screenshots that illustrate these points.

Cynthia Murrell, October 28, 2014

Sponsored by, developer of Augmentext

Enterprise Search, Knowledge Management, & Customer Service: Some of the Study Stuff Ups Evident?

October 27, 2014

One of my two or three readers sent me a link to “The 10 Stuff Ups We All Make When Interpreting Research.” The article walks through some common weaknesses individuals make when “interpreting research.” I don’t agree with the “all” in the title.

This article arrived as I was reading a recent study about search. As an exercise on a surprisingly balmy Sunday afternoon in Kentucky, I jotted down the 10 “stuff ups” presented in the Interpreting Research article. Here they are in my words, paraphrased to sidestep plagiarism, copyright, and Google duplication finder issues:

  1. One study, not a series of studies. In short, an anomaly report.
  2. One person’s notion of what is significant may be irrelevant.
  3. Mixing up risk and the Statistics 101 notion of “number needed to treat” gets the cart before the horse.
  4. Trends may not be linear.
  5. Humans find what they want to find; that is, pre existing bias or cooking the study.
  6. Ignore the basics and layer cake the jargon.
  7. Numbers often require context. Context in the form of quotes in one on one interviews require numbers.
  8. Models and frameworks do not match reality; that is, a construct is not what is.
  9. Specific situations do matter.
  10. Inputs from colleagues may not identify certain study flaws.

To test the article’s premises, I I turned to a study sent to me by a persona named Alisa Lipzen. Its title is “The State of Knowledge Management: 2014. Growing role & Value of Unified Search in Customer Service.” (If the link does not work for you, you will have to contact either of the sponsors, the Technology Services Industry Association or Coveo, an enterprise search vendor based in Canada.) You may have to pay for the report. My copy was free. Let’s do a quick pass through the document to see if it avoids the “stuff ups.”

First, the scope of the report is broad:

1. Knowledge management. Although I write a regular column for KMWorld, I must admit that I am not able to define exactly what this concept means. Like many information access buzzwords, the shotgun marriage of “knowledge” and “management” glues together two abstractions. In most usages, knowledge management refers to figuring out what a person “knows” and making that information available to others in an organization. After all, when a person quits, having access to that person’s “knowledge” has a value. But “knowledge” is as difficult to nail down as “management.” I suppose one knows it when one encounters it.

2. Unified search. The second subject is “unified search.” This is the idea that a person can use a single system to locate information germane to a query from a single search box. Unified suggests that widely disparate types of information are presented in a useful manner. For me, the fact that Google, arguably the best resourced information access company, has been unable to deliver unified search. Note that Google calls its goal “universal search.” In the 1980s, Fulcrum Technologies (Ottawa, Canada) search offered a version of federated search. In 2014, Google requires that a user run a query across different silos of information; for example, if I require informatio0n about NGFW I have to run the query across Google’s Web index, Google scholarly articles, Google videos, Google books, Google blogs, and Google news. This is not very universal. Most “unified” search solutions are marketing razzle dazzle for financial, legal, technical, and other reasons. Therefore, organizations have to have different search systems.

3. Customer service. This is a popular bit of jargon. The meaning of customer service, for me, boils down to cost savings. Few companies have the appetite to pay for expensive humans to deal with the problems paying customers experience. Last week, I spent one hour on hold with an outfit called Wellcare. The insurance company’s automated system reassured me that my call was important. The call was never answered. What did I learn. Neither my call nor my status as a customer was important. Most information access systems applied to “customer service” are designed to drive the cost of support and service as low as possible.


“Get rid of these expensive humans,” says the MBA. “I want my annual bonus.”

I was not familiar with the TSIA. What is its mission? According the the group’s Web site:

TSIA is organized around six major service disciplines that address the major service businesses found in a typical technology company.

Each service discipline has its own membership community led by a seasoned research executive. Additionally, each service discipline has the following:

In addition, we have a research practice on Service Technology that spans across all service discipline focus areas.

My take is that TSIA is a marketing-oriented organization for its paying members.

Now let’s look at some of the the report’s key findings:

The people, process, and technology components of technology service knowledge management (KM) programs. This year’s survey examined core metrics and practices related to knowledge capture, sharing, and maintenance, as well as forward-looking elements such as video, crowd sourcing, and expertise management. KM is no longer just of interest to technical support and call centers. The survey was open to all TSIA disciplines, and 50% of the 400-plus responses were from groups other than support services, including 24% of responses from professional services organizations.

Read more

Living with Google Requires Innovation

October 27, 2014

The article on South China Morning Post Technology titled Search Websites Diversify in Scope and Learn to Coexist with Google explores the options for Google’s ugly stepsisters Bing and Yahoo (among others). Rather than even attempting to unseat the search giant, Chris Wallace of Mindshare Worldwide and Will McInnes of Brandwatch advocate a tailoring approach for search engines not named Google. The article states,

“Microsoft has Xbox, and this is its opportunity to integrate into the living room and be the search device of choice there” Wallace says… niche search engines are emerging, usually with one killer app that does something specific Google can’t match. None will take over from the Big G any time soon, but if you have a specific need, they’re worth bearing in mind… The message? Don’t avoid Google, but diversify your usage.”

Whether you are looking specifically for music, social media data, or the latest news, there are alternatives to Google in the form of Live Plasma, Blekko and Pinterest or even Facebook. The article suggests loosening the Google security blanket we have wrapped ourselves in so cozily and considering other options. Specialized search engines like Yelp for restaurants will help us more because they are tailor-made for one area of the market.

Chelsea Kerwin, October 27, 2014

Sponsored by, developer of Augmentext

« Previous PageNext Page »