Natural Language Processing: Tomorrow and Yesterday

October 31, 2017

I read “Will Natural Language Processing Change Search as We Know It?” The write up is by a search specialist who, I believe, worked at Convera. The Search Technologies’ Web site asserts:

He was the architect and inventor of RetrievalWare, a ground-breaking natural-language based statistical text search engine which he started in 1989 and grew to $50 million in annual sales worldwide. RetrievalWare is now owned by Microsoft Corporation.

I think Fast Search acquired a portion of Convera. When Microsoft purchased Fast Search, the Convera technology was part of the deal. When Convera faded, one rumor I captured in 2007 was that some of the Convera technology was used by Ntent, formed as the result of a merger between Convera Corporation and Firstlight ERA. If accurate, the history of Convera is fascinating with Excalibur, ConQuest, and Allen & Co. in the mix.

In the “Will Natural Language Processing Change Search As We Know It” blog post, I noted these points:

  • Intranets incorporating NLP, semantic search and AI can fuel chatbots as well as end-to-end question-answering systems that live on top of search. It is a truly semantic extension to the search box with far-reaching implications for all types of search.
  • With NLP, enterprise knowledge contained in paper documentation can be encoded in a machine-readable format so the machine can read, process and understand it enough to formulate an intelligent response.
  • it’s good to know about established tool sets and methodologies for developing and creating effective solutions for use cases like technical support. But like all development projects, take care to create the tools based on mimicking the responses of actual human domain experts. Otherwise, you may run into the proverbial development problem of “garbage in, garbage out” which has plagued many such expert system initiatives.

Mr. Nelson is painting a reasonable picture about the narrow use of widely touted technologies. In fact, the promise of NLP has been part of enterprise search marketing for decades.

What I found interesting was the Convera document called “Accurate Search: What a Concept, published by Convera in 2002. I noted this passage on page 4 of the document:

Concept Search capitalizes on the richness of language, with its multiple term meanings, and transforms it from a problem into an advantage. RetrievalWare performs natural language processing and search term expansion to paraphrase queries, enabling retrieval of documents that contain the specific concepts requested rather than just the words typed during the query while also taking advantage of its semantic richness to rank documents in results lists. RetrievalWare’s powerful pattern search abilities overcome common errors in both content and queries, resulting in greater recall and user satisfaction.

I find the shift from a broad solution to a more narrow solution interesting. In the span of 15 years, the technology of search seems to be struggling to deliver.

Perhaps consulting and engineering services are needed to make search “work”? Contrast search with mobile phone technology. Progress has been evident. For search, success narrows to improving “documentation” and “customer support.”

Has anyone tried to reach PayPal’s customer support or United Airlines’ customer support? Try it. United was at one time a “customer” of Convera’s. From my point of view, United Airlines’ customer service has remained about the same over the last decade or two.

Enterprise search, broad or narrow, remains a challenge for marketers and users in my opinion. NLP, I assume, has arrived after a long journey. For a free profile of Convera, check out this link.

Stephen E Arnold, October 31, 2017

Big Data Less Accessible for Small and Mid-Size Businesses

October 31, 2017

Even as the term “Big Data” grows stale, small and medium-sized businesses (SMB’s) are being left behind in today’s data-driven business world. The SmartData Collective examines the issue in, “Is Complexity Strangling the Real-World Benefits of Big Data for SMB’s?” Writer Rehan Ijaz supplies this example:

Imagine a local restaurant chain fighting to keep the doors open as a national competitor moves into town. The national competitor will already have a competent Cloud Data Manager (CDM) in place to provide insight into what should be offered to customers, based on their past interactions. A multi-million-dollar technology is affordable, due to scale, for a national chain. The same can’t be said for a smaller, mom and pop type restaurant. They’ve relied on their gut instinct and hometown roots to get them this far, but it may not be enough in the age of Big Data. Large companies are using their financial muscle to get information from large data sets, and take targeted action to outmaneuver local competitors.

Pointing to an article from Forbes, Ijaz observes that the main barrier for these more modestly-sized enterprises is not any hesitation about the technology itself, but rather a personal issue—their existing marketing employees were not hired for their IT prowess, and even the most valuable data requires analysis to be useful. Few SMB’s are eager to embrace the cost and disruption of hiring data scientists and reorganizing their marketing teams; they have to be sure it will be worth the trouble.

Ijaz hopes that the recent increase in scalable, cloud-based analysis solutions will help SMB’s with these challenges. The question is, he notes, whether it is too late for many SMB’s to recover from their late foray into Big Data.

Cynthia Murrell, October 31, 2017

Google Deletes Idle Android Accounts

October 31, 2017

If you have future plans that take you overseas to areas with limited to zero smartphone connectivity for more than two months and you have an Android-based phone you are in trouble.  Vernonchan reports that “Google Will Delete Your Android Backups If Your Device Is Inactive For Two Months.”

It came as quite a shock for the article’s author and only came to light, because a Reddit user was caught off guard.  The story goes that the Reddit user sent his Nexus 6P in for a refund claim and while waiting for a replacement Android device, he used an old phone.  He thought his Nexus 6P backups were safe, but when he checked his Google Drive backup folder they were gone!

He found this document related to backups: “Manage & Restore Your Device Backups In Google.”  Here is what is found in the document:

The document briefly details about finding, managing, and deleting backups. Right at the end, Google explains what happens when your backup expires: ‘Your backup will remain as long as you use your device. If you don’t use your device for 2 weeks, you may see an expiration date below your backup. For instance: “Expires in 54 days.’

Note that once a backup is deleted, there is zero chance for recovery.

In other words, you are screwed!  If you use your device regularly, there is nothing to worry about.  But if you are headed overseas for that rare place on Earth without limited to zero smartphone access for an extended period you are doomed.  If you have a warranty claim or sent the device in for repair, there are concerns there as well.

The Reddit user was extremely surprised that he never received any warning from Google and thought that it would be a good PSA to alert others to the time limit on backups.

Google, we know you have a lot going on, but it is good customer service to alert your users to things as important as data deletion!

Whitney Grace, October 31, 2017

Google Management: Doing Better with Burgers

October 30, 2017

With pressure mounting on US search and social media companies to become more “responsible”, I noted that a high priority for Google’s CEO is a cheeseburger emoji. I read a rather tasty article called “Google CEO Makes Fixing Hamburger Emoji His Top Priority.” The write up points out that the Apple cheeseburger emoji has the cheese on top of the meat patty. The Google emoji puts the cheese on the bottom of the meat patty. It is good to know that when serious issues rise up to choke a bureaucracy, senior managers can respond. Management by example, pickle, and lettuce.

Stephen E Arnold, October 30, 2017

SEO Benefits Take Time to Realize

October 30, 2017

In many (most?) fields today, it is considered essential for companies to position themselves as close to the top of potential customers’ Web search results as possible. However, search engine optimization (SEO) efforts take time. Business 2 Community explains “Why It Takes Six Months to Improve Search Rankings.”  Marketers must accept that, unless they luck out with content that goes viral, they will just have to be patient for results. Writer Kent Campbell explains five reasons this is the case, and my favorite is number one—search systems were not built to aid marketers in the first place! In fact, in some ways, quite the opposite. Campbell writes:

Bing and Google Serve Their Searchers, Not You.

A search provider’s primary concern is its users, not you or any other business that’s fighting for a spot on the first page. The search engine’s goal is to provide the best user experience to its searchers; that means displaying the most relevant and high quality results for every search query. Both Bing and Google watch how people react to content before they decide how visible that content should be in results. Even when content has had a lot of SEO therapy, the content itself has to be spot-on. This is why Google evaluates every piece of content on more than 200 ranking factors and ensures that only the best quality pages make it to the top 10. The best way to make it to the first page is by aligning yourself with Google’s objective, which is to serve its users.

A company might be seeing slow results because they hesitated—Early Movers Have an Advantage is the second reason Campbell gives. On the other hand, at number three, we find that Creating Quality Content Takes Time. Then there is the fact that Link Building Is Not as Simple as Before. Finally, there’s this more recent complication—Social Media Also Impacts Rankings these days. See the article for Campbell’s explanation for each point. He concludes with a little advice: companies would do well to consider their SEO efforts an ongoing cost of doing business, rather than an extraordinary item.

Cynthia Murrell, October 30, 2017

Privacy Is Lost in Translation

October 30, 2017

Online translation tools are a wonder!  Instead of having to rely on limited dictionaries and grammars, online translation tools deliver real-time, nearly accurate translations of documents and other text.  It is usually good to double check the translation because sometimes the tools do make mistakes.  Translation tools, however, can make mistakes that lose privacy in translation.  Quartz tells an alarming story in, “If You Value Your Privacy, Be Careful With Online Translation Tools.”

Norwegian state oil company Statoil used to translate sensitive company documents.  One would think that would not be a problem, except stored the data in the cloud.  The sensitive documents included dismissal letters, contracts, workforce reduction plans, and more.  News traveled fast in Norway, resulting in the Oslo Stock Exchange blocking employees’ access to and Google Translate.

It was dubbed a massive privacy breach as private documents from other organizations and individuals were discovered. views the incident differently: sees things a little differently, however, saying it was straight with users about the fact that it was crowdsourcing human translations to improve on machine work. In a Sept. 6 blog post responding to the news reports, the company explained that in the past, they were using human volunteer translators to improve their algorithm, and during that time, had made documents submitted for translation public so that any human volunteers could easily access them. ‘As a precaution, there was a clear note on our homepage stating: ‘All translations will be sent to our community to improve accuracy.’ also offered to remove any documents upon request, but sensitive documents were still available when the Quartz article was written.  Vice president of Sales for Maria Burud pointed out that they offer a paid translation software intended for businesses to maintain their privacy.  Burud notes that that anything translate using a free web tool is bound to have privacy issues, but that there is a disclaimer on her company’s Web site.  It is up to the user to de-identify the information or watch what they post in a translation box.

In other words, watch what you translate and post online.  It will come back to haunt you.

Whitney Grace, October 30, 2017

Amazon Google Money Factoids

October 29, 2017

I noted the financial results of Amazon and Google. Amazon reported third quarter sales of $43 billion. Google tallied revenues of $27.7 billion. Amazon has multiple revenue streams; Google is making Steve Ballmer’s one-trick pony comment hold true. Will Google close the revenue gap? Will Amazon stumble?

Stephen E Arnold, October 29, 2017

Great Moments in Publishing: The Gray Lady on Tor

October 28, 2017

I read “The New York Times Is Now a Tor Onion Service.” Interesting. Tor attracts about three million users per month.



I found the decision a bit of a surprise. Increasing censorship squeezes some individuals to “hidden” information services. I am aware of the data which suggests that Tor and other hidden services are used for good purposes. For a run down on nine benefits, review “9 Things You Probably Don’t Know about Positive Uses of the Dark/Deep Web.” [I corrected the misspelling of “probable” in the title.]

On the other hand, other individuals use hidden services for less sunlight and happiness type activities. See, for example, “Dark Web Browser Tor Is Overwhelmingly Used for Crime, Says Study.”

The New York Times wants traffic and subscribers.

I will be watching for a surge in New York Times revenue and a spate of new Dark Web services. The Dark Web does offer online advertising. Perhaps this will be a new frontier for the newspaper. For more information about our most recent monograph, check out the description of Dark Web Notebook.

Stephen E Arnold, October 28, 2017

Need Better Charts and Graphs?

October 27, 2017

If you want to move beyond the vanilla charts and graphs in Excel and PowerPoint, you will want to read “The 15 Best Data Visualization Tools.” Don’t forget to make sure the data you present are accurate, timely, and germane to the point your snappy graphic will make. (Keep in mind that some folks are happy with snazzy visuals. Close enough for horseshoes.)

Stephen E Arnold, October 27, 2017

Enterprise Search: Will Synthetic Hormones Produce a Revenue Winner?

October 27, 2017

One of my colleagues provided me with a copy of the 24 page report with the hefty title:

In Search for Insight 2017. Enterprise Search and Findability Survey. Insights from 2012-2017

I stumbled on the phrase “In Search for Insight 2017.”


The report combines survey data with observations about what’s going to make enterprise search great again. I use the word “again” because:

  • The buy up and sell out craziness which culminated with Microsoft’s buying Fast Search & Transfer in 2008 and Hewlett Packard’s purchase of Autonomy in 2011 marked the end of the old-school enterprise search vendors. As you may recall, Fast Search was the subject of a criminal investigation and the HP Autonomy deal continues to make its way through the legal system. You may perceive these two deals as barn burners. I see them as capstones for the era during which search was marketed as the solution to information problems in organizations.
  • The word “search” has become confusing and devalued. For most people, “search” means the Danny Sullivan search engine optimization systems and methods. For those with some experience in information science, “search” means locating relevant information. SEO erodes relevance; the less popular connotation of the word suggests answering a user’s question. Not surprisingly, jargon has been used for many years in an effort to explain that “enterprise search” is infused with taxonomies, ontologies, semantic technologies, clustering, discovery, natural language processing, and other verbal chrome trim to make search into a Next Big Thing again. From my point of view, search is a utility and a code word for spoofing Google so that an irrelevant page appears instead of the answer the user seeks.
  • The enterprise search landscape (the title of one of my monographs) has been bulldozed and reworked. The money in the old school precision and recall type of search comes from consulting. Search Technologies was acquired by Accenture to add services revenue to the management consulting firm’s repertoire of MBA fixes. What is left are companies offering “solutions” which require substantial engineering, consulting, and training services. The “engine”, in many cases, are open source systems which one can download without burdensome license fees. From my point of view, search boils down to picking an open source solution. If those don’t work, one can license a proprietary system wrapped around open source. If one wants a proprietary system, there are some available, but these are not likely to reach the lofty heights of the Fast Search or Autonomy IDOL systems in the salad days of enterprise search and its promises of a universal search system. The universal search outfit Google pulled out of enterprise search for a reason.

I want to highlight five of the points in the 24 page write up. Please, register to get your own copy of this document.

Here are my five highlights. My comments are in italics after each quote from the document:

Read more

Next Page »

  • Archives

  • Recent Posts

  • Meta