New Look for Internet Archive

October 29, 2014

The Internet Archive has a new look. You may have seen the change, but I don’t visit the site too frequently. I have struggles with its search system.

The new look features many postage stamp graphics and some text. Click on a graphic and one is sent to the appropriate Archive page.

Here’s a screenshot of the content available to you.


How does one search this content? The search box returns a list of hits with an icon indicating the content type. Have the cheerleaders for unified search would have cracked the information access challenge for a single search box to access mixed content types? I am still a fan of one at a time searching. Inefficient, but I get a sense of the collection’s scope and the idiosyncrasies of the indexed information.

Searching today is more difficult than it was in 1980 in my opinion. The method required is to know what in a collection before one queries it.

How does one know what’s in each of these collections? Well, unfortunately you can no longer ask a librarian in many organizations.

You are on your own, pilgrim.

Stephen E Arnold, October 29, 2014

EasyAsk Adds Glitter to Oya Costumes

October 29, 2014

I learned that Oya Costumes has tapped EasyAsk to provide the search function for You can read the news release here. I clicked around using drop downs and facets. I did run a query to locate a suitable Harrod’s Creek Halloween costume. I searched for Darth Vader. The results were mostly on point. There was one anomaly, an inflatable purple suit. Perhaps Darth has a side few know about.

Here’s the result page for my query:


Here’s a close up of the purple outfit mapped to the query “Darth Vader.”


I quite like the inflatable purple suit. I assume it is semantically related to Mr. Vader.

Stephen E Arnold, October 29, 2014

The Cheating Search Engine

October 29, 2014

Spokeo is a people search engine that when you enter in a name, email address, or username it returns personal information. The results usually include address, phone number, other email addresses, or aliases. People searches are the digital equivalent of a phone book’s white pages, except it is easier to manipulate the information for the desired output. The results, however, are limited based off how much you want to pay for.

Most of the time people search engine advertisements are banal, but Spokeo has tried a new approach. When you visit the Web site, you are greeted by the following caption: “Is He Cheating On You?” This warning follows it:

CAUTION: This information is potentially shocking. Spokeo uses proprietary deep web technology to search over 70 social networks for status updates, photos, relationships, and profiles. Please prepare yourself for the unexpected.”

A picture of a couple caught in a scandalous position flocks the search box. Spokeo is trying to appeal to an entire new clientele, perhaps the kind who click on advertisements about learning a new language in a week or melting away the pounds with a new, exclusive diet pill offer. The search results include dating profiles, social media accounts, aliases, hidden photos, etc. The type of information you

While Spokeo was not the first people search engine of choice, it does provide basic information about individuals. This new advertising campaign, however, pushes it into the lowbrow Internet and makes its content questionable. Why the sudden change in marketing? Is Spokeo seeing a revenue drop or has seen a spike in profits when used as a cheating search engine?

Whitney Grace, October 29, 2014
Sponsored by, developer of Augmentext

Beyond Intranet Search

October 28, 2014

Apparently, there is a difference between search and knowledge management; I guess you learn something new every day. CMS Wire asks, “Intranet Search: Where Documents Go to Die or KM Enabler?” Writer Jed Cawthorne uses Coveo’s platform to illustrate ways a company can go beyond the “baked in” search functionality in an intranet content management system. He writes:

“You don’t need to stick with the ‘built in solution’ if search is important to your KM / Enterprise Information Management strategies. There are alternatives beyond the ever more standard SharePoint (even though building FAST technology into core SharePoint 2013 has improved it) or the really big (and expensive) heavy hitters like HP’s IDOL platform.

“With the growing rate at which our mountains of internal content grow ever bigger, search capabilities are a fundamental element of an intranet, and of the broader digital workplace. If you want to apply long tail principles to mountains of social content, such as discussion forums, news feeds and updates, a search engine with concept search capabilities would be a good idea, unless you have a work force which is truly at one with tagging absolutely everything with appropriate and valuable metadata … (what, you work in the Library of the Jedi Temple? Cool!).”

Cawthorne spoke to Coveo’s Diane Berry about her company’s knowledge management options. She emphasizes broad content-source connectivity, metadata enrichment through text analytics (for companies lacking Jedi librarians), and building taxonomies through entity extraction. A user-interface based on users’ needs is also key, she notes, and mobile interfaces are a part of that. So is making it easy to adjust search and analysis parameters. See the write-up for more details and some screenshots that illustrate these points.

Cynthia Murrell, October 28, 2014

Sponsored by, developer of Augmentext

Enterprise Search, Knowledge Management, & Customer Service: Some of the Study Stuff Ups Evident?

October 27, 2014

One of my two or three readers sent me a link to “The 10 Stuff Ups We All Make When Interpreting Research.” The article walks through some common weaknesses individuals make when “interpreting research.” I don’t agree with the “all” in the title.

This article arrived as I was reading a recent study about search. As an exercise on a surprisingly balmy Sunday afternoon in Kentucky, I jotted down the 10 “stuff ups” presented in the Interpreting Research article. Here they are in my words, paraphrased to sidestep plagiarism, copyright, and Google duplication finder issues:

  1. One study, not a series of studies. In short, an anomaly report.
  2. One person’s notion of what is significant may be irrelevant.
  3. Mixing up risk and the Statistics 101 notion of “number needed to treat” gets the cart before the horse.
  4. Trends may not be linear.
  5. Humans find what they want to find; that is, pre existing bias or cooking the study.
  6. Ignore the basics and layer cake the jargon.
  7. Numbers often require context. Context in the form of quotes in one on one interviews require numbers.
  8. Models and frameworks do not match reality; that is, a construct is not what is.
  9. Specific situations do matter.
  10. Inputs from colleagues may not identify certain study flaws.

To test the article’s premises, I I turned to a study sent to me by a persona named Alisa Lipzen. Its title is “The State of Knowledge Management: 2014. Growing role & Value of Unified Search in Customer Service.” (If the link does not work for you, you will have to contact either of the sponsors, the Technology Services Industry Association or Coveo, an enterprise search vendor based in Canada.) You may have to pay for the report. My copy was free. Let’s do a quick pass through the document to see if it avoids the “stuff ups.”

First, the scope of the report is broad:

1. Knowledge management. Although I write a regular column for KMWorld, I must admit that I am not able to define exactly what this concept means. Like many information access buzzwords, the shotgun marriage of “knowledge” and “management” glues together two abstractions. In most usages, knowledge management refers to figuring out what a person “knows” and making that information available to others in an organization. After all, when a person quits, having access to that person’s “knowledge” has a value. But “knowledge” is as difficult to nail down as “management.” I suppose one knows it when one encounters it.

2. Unified search. The second subject is “unified search.” This is the idea that a person can use a single system to locate information germane to a query from a single search box. Unified suggests that widely disparate types of information are presented in a useful manner. For me, the fact that Google, arguably the best resourced information access company, has been unable to deliver unified search. Note that Google calls its goal “universal search.” In the 1980s, Fulcrum Technologies (Ottawa, Canada) search offered a version of federated search. In 2014, Google requires that a user run a query across different silos of information; for example, if I require informatio0n about NGFW I have to run the query across Google’s Web index, Google scholarly articles, Google videos, Google books, Google blogs, and Google news. This is not very universal. Most “unified” search solutions are marketing razzle dazzle for financial, legal, technical, and other reasons. Therefore, organizations have to have different search systems.

3. Customer service. This is a popular bit of jargon. The meaning of customer service, for me, boils down to cost savings. Few companies have the appetite to pay for expensive humans to deal with the problems paying customers experience. Last week, I spent one hour on hold with an outfit called Wellcare. The insurance company’s automated system reassured me that my call was important. The call was never answered. What did I learn. Neither my call nor my status as a customer was important. Most information access systems applied to “customer service” are designed to drive the cost of support and service as low as possible.


“Get rid of these expensive humans,” says the MBA. “I want my annual bonus.”

I was not familiar with the TSIA. What is its mission? According the the group’s Web site:

TSIA is organized around six major service disciplines that address the major service businesses found in a typical technology company.

Each service discipline has its own membership community led by a seasoned research executive. Additionally, each service discipline has the following:

In addition, we have a research practice on Service Technology that spans across all service discipline focus areas.

My take is that TSIA is a marketing-oriented organization for its paying members.

Now let’s look at some of the the report’s key findings:

The people, process, and technology components of technology service knowledge management (KM) programs. This year’s survey examined core metrics and practices related to knowledge capture, sharing, and maintenance, as well as forward-looking elements such as video, crowd sourcing, and expertise management. KM is no longer just of interest to technical support and call centers. The survey was open to all TSIA disciplines, and 50% of the 400-plus responses were from groups other than support services, including 24% of responses from professional services organizations.

Read more

Living with Google Requires Innovation

October 27, 2014

The article on South China Morning Post Technology titled Search Websites Diversify in Scope and Learn to Coexist with Google explores the options for Google’s ugly stepsisters Bing and Yahoo (among others). Rather than even attempting to unseat the search giant, Chris Wallace of Mindshare Worldwide and Will McInnes of Brandwatch advocate a tailoring approach for search engines not named Google. The article states,

“Microsoft has Xbox, and this is its opportunity to integrate into the living room and be the search device of choice there” Wallace says… niche search engines are emerging, usually with one killer app that does something specific Google can’t match. None will take over from the Big G any time soon, but if you have a specific need, they’re worth bearing in mind… The message? Don’t avoid Google, but diversify your usage.”

Whether you are looking specifically for music, social media data, or the latest news, there are alternatives to Google in the form of Live Plasma, Blekko and Pinterest or even Facebook. The article suggests loosening the Google security blanket we have wrapped ourselves in so cozily and considering other options. Specialized search engines like Yelp for restaurants will help us more because they are tailor-made for one area of the market.

Chelsea Kerwin, October 27, 2014

Sponsored by, developer of Augmentext

Semantic Web: Remember That for Enterprise Search?

October 26, 2014

You can find an interesting discussion of the Semantic Web on Hacker News. Semantic Web search engines have had a difficult time capturing the imagination of the public. The write up and the comments advance the notion that the Semantic Web is alive and well, just invisible.

I found the statement from super Googler Peter Norvig a window into how Google views the Semantic Web. Here’s the snippet:

Peter Norvig put it best: “The semantic web is the future of the web, and always will be.” (For what it’s worth, the startup school video that quote comes from is worth watching:

There are references to “semantic search” companies that have failed; for example, Ontoprise. There are links to clever cartoons.

The statement I highlighted was:

The underlying data just doesn’t necessarily map very well into the seem-web representations, so duplicates occur and possible values explode in their number of valid permutations even though they all mean the same handful of things. And it’s the read-only semantic-web, so you can’t just clean it, you have to map it.. Which is why I’m always amazed that works at all. And hopefully one day will be a thing. I remember being excited about for “liberating” messy data into clean linked data… but it turns out that you really don’t want to curate your information “in the graph”; it seems obvious, but traditional relational datasets are infinitely more manageable than arbitrarily connected nodes in a graph. So, most CMS platforms are doing somewhat useful things in marking up their content in machine-readable ways (RDFa, [as evil as that debacle was], HTTP content-type negotiation and so on) either out-of-the-box or with trivially installed plugins.

Ah, content management systems. Now that’s the model for successful information access as long as one does not want engineering drawings, videos, audio, binaries, and a host of proprietary data types like i2 Analyst Notebook files.

Worth checking out the thread in my view.

Stephen E Arnold, October 26, 2014

ArnoldIT Search Requirements Video

October 26, 2014

The goslings continue to experiment with short videos. The most recent on is about enterprise search requirements. The four minute YouTube program hits some highlights about the perilous process of licensing an enterprise search system. The video is located at

Donald C Anderson, October 26, 2014

Stone Temple Consulting Creates Knowledge Panel to Test Google, Siri, and Cortana

October 24, 2014

The article titled The Great Knowledge Box Showdown : Google Now vs. Siri vs. Cortana on Stone Temple Consulting compares the capabilities of the three platforms with 3086 queries.

“This was a…knowledge box comparison, not a personal assistant comparison. For purposes of this study, a “knowledge box” or “knowledge panel” is defined as content in the search results that attempts to directly answer a question asked in a search query…”

The article provides a long list of the sort of questions posed to the three platforms and the ways that the different systems answered correctly and incorrectly. Knowledge boxes might appear in the form of a carousel (a set of images or info above the search results) or as step by step instructions (provided, for example, in response to a question about how to make a certain recipe.) Ultimately, the article found,

Google Now returns twice as many results as Siri and nearly three times as many results as Cortana. This is clear evidence that Google is much further down the path with this type of work than either Apple or Cortana.”

It also is important to mention that according to the article errors occur 15 percent of the time in the best system for mission critical situations. Hmmm.

Chelsea Kerwin, October 24, 2014

Sponsored by, developer of Augmentext

Cautious Words on Microsoft Delve

October 22, 2014

Much buzz has been collecting around Microsoft’s Delve (formerly known as Oslo), the new search-and-discovery component of Office 365. ComputerWorldUK, however, raises some questions in, “Delve, Office Graph Must Transcend Office 365 to be Revolutionary.” The application is designed to tap into the company’s Office Graph machine-learning engine, but apparently has a way to go before fulfilling its creators’ goals. Reporter Juan Carlos Perez writes:

“If Microsoft realizes its Office Graph vision — and it may take years to materialize — then the way information workers interact with business software today and the way they find digital information will seem ancient and grossly inefficient. And Microsoft might fly past competitors in the enterprise with a technology that creates a sort of cockpit that automates and simplifies for employees the use of their Microsoft and non-Microsoft software.”

Delve began gradually rolling out to Office users in September, with the process to be completed sometime next year. The tool can be used as a conventional search engine, but it is designed to do much more. The article supplies this example:

“Delve knows that ‘Joe’ has a meeting in an hour, what its topic is and who will be in attendance. So, Delve proactively fetches relevant documents, files and information about the topic and the participants, and displays them on its dashboard, so Joe can be prepared for the meeting. Joe didn’t have to spend 30 minutes compiling all this data manually, assuming that he even would have had the time to do it, and if he did, that he would have been able to find the information, a big challenge for employees of all stripes everywhere.”

Sounds great! However, Perez notes that some open questions stand between here and the realization of Delve’s potential. Perhaps most obviously, being able to comb only Office applications for data is limiting; most of us don’t limit ourselves to Microsoft products (as much as the company might like us to.) There are considerable technical challenges there. Then there’s the privacy issue—will users find it’s “stealthy technology” creepy, and possibly be worried about nosy supervisors? Apparently, some more end-user controls are planned, but they may not address that concern. See the article for more thorough discussion of these issues. Will Delve overcome these obstacles?

Cynthia Murrell, October 22, 2014

Sponsored by, developer of Augmentext

Next Page »