CyberOSINT banner

Quality and Text Processing: An Old Couple Still at the Alter

August 6, 2015

I read “Why Quality Management Needs Text Analytics.” I learned:

To analyze customer quality complaints to find the most common complaints and steer the production or service process accordingly can be a very tedious job. It takes time and resources.

This idea is similar to the one expressed by Ronen Feldman in a presentation he gave in the early 2000s. My notes of the event record that he reviewed the application of ClearForest technology to reports from automobile service professionals which presented customer comments and data about repairs. ClearForest’s system was able to pinpoint that a particular mechanical issue was emerging. The client responded to the signals from the ClearForest system and took remediating action. The point was that sometime in the early 2000s, ClearForest had built and deployed a text analytics system with a quality-centric capability.

I mention this point because many companies are recycling ideas and concepts which are in some cases long beards. ClearForest was acquired by the estimable Thomson Reuters. Some of the technology is available as open source at Calais.

In search and content processing, the case examples, the lingo, and even the technology has entered what I call its “recycling” phase.

I learned about several new search systems this week. I looked at each. One was a portal, another a metasearch system, and a third a privacy centric system with a somewhat modest index. Each was presented as new, revolutionary, and innovative. The reality is that today’s information highways are manufactured from recycled plastic bottles.

Stephen E Arnold, August 6, 2015

Rocket AeroText Search: Stretching the Access Concept

August 6, 2015

I did a quick check on AeroText search. I assume that even the most jejune enterprise search expert is familiar with this system. What I noticed is that AeroText now moves beyond search into six separate functions. These reminded me of Fast Search & Transfer’s approach in the 2006-2007, pre-implosion period.

The six functions, which you can read about and request a demo of, are at this link. These are:

  1. Folio Views. The idea is that basic search and retrieval are provided by Rocket
  2. Folio Builder. The idea is that information can be organized into folders for research purposes
  3. Folio Publisher. A commercial publishing company can package its information and sell it in digital form.
  4. Folio Integrator. This is a a software development kit.
  5. NXT Enterprise Server. This is the enterprise centric content processing and search system.
  6. NXT Professional Publishing Server. This is a “suite for storing, assembling, securing, and distributing content” which includes search.

If you navigate have a copy of one the first three editions of the Enterprise Search Report I wrote between 2003 and 2006, you will be able to check out the similarities. I present some of the Fast Search nomenclature in this 2012 article.

I find the marketing and positioning of Autonomy and Fast Search interesting. These companies themes are as fresh today as they were years ago.

Stephen E Arnold, August 6, 2015

Hey Google Doubters, Burn This into Your Memory

August 6, 2015

It has been speculated that Google would lose its ad profits as mobile search begins to dominate the search market but Quartz tells a different story in the article, “Mobile Isn’t Ruining Google’s Search Business After All.”  Google’s revenue continues to grow, especially with YouTube, but search remains its main earner.

According to the second-quarter earnings, Google earned $12.4 billion in Google Web sites, a $1.5 billion increase from last year.  Google continues to grow on average $1.6 billion per quarter.  Being able to maintain a continuous growth proves that Google is weathering the mobile search market.  Here is some other news, the mobile search revolution is now and not in the future.

“That is, if mobile really was going to squeeze Google’s search advertising business, we probably would have already seen it start by now. Smartphone penetration keeps deepening—with 75% saturation in the US market, according to comScore. And for many top media properties, half of the total audience only visits on mobile, according to a recent comScore report on mobile media consumption.”

There are new actions that could either impede or help Google search, such as deep linking between apps and the Web and predictive information services, but these are still brand new and their full effect has not been determined.

Google refuses to be left behind in the mobile search market and stands to be a main competitor for years to come.

Whitney Grace, August 6, 2015

Sponsored by, publisher of the CyberOSINT monograph


Google: Technical Debt Has Implications for Some AI Cheerleaders

August 5, 2015

If you are interested in smart software, you may want to read “Machine Learning: the High Interest Credit Card of Technical Debt.” I like the credit card analogy. It combines big costs with what some folks see as a something-for-nothing feature of the modern world.

The write up is important because it makes clear the future cost of using certain machine learning methods. The paper helps explain why search and content processing companies often burn more cash than available.

The paper identifies specific cost points which most MBAs happily ignore or downplay in post mortems of failed search and content processing companies. The whiz kids, both boys and girls, rationalize their failure to deal with shifting boundaries, “dark dependencies,” expensive spaghetti, and the tendency of smart software to sort of drift off center.

There is a fix. It is just darned expensive like credit card interest as the clueless consumer just covers the interest.

Applying the Google paper to search and content processing vendors, the only positive financial outcome is to sell the dog before it dies. Shift the search and content problem “credit card debt” to some other firm.

Perhaps that helps explain the Lexmark financial challenge and the dismay at Hewlett Packard as the reality of Autonomy dawned on those quick to spend billions.

Worth reading. Well done, Googlers.

Stephen E Arnold, August 5, 2015

Dr. Watson: Concerned about Flabbiness and Sugar

August 5, 2015

The IBM PR attack continues. Today’s installment pits Watson (IBM’s Jeopardy winning, post production blind, Lucene based smart software) against flab and short chain soluble carbohydrates. Think diabetes or worse a visit to the dentist.

Navigate to “Dr. Watson: IBM Plans to Use big Data to Manage Diabetes and Obesity.” The story is not new. Once again IBM is reporting a “team up” deal. I wish the stories about Watson would talk about landing very large contracts with major government entities or Fortune 100 firms. I cannot get excited about old fashioned data mining applications. Sorry. Call me jaded.

The write up states without one whit of skepticism:

This new partnership marks a substantial leap into the healthcare sector for IBM, with CVS joining the likes of Apple and Medtronic as partners of IBM’s growing data service,Watson Health. By partnering up with CVS, Watson will be able to analyze and learn from “an unprecedented mix of health information sources”, including medical records, medical insurance claims and data from smart fitness devices.

I found the notion of the UK’s National Health Service hooking up with IBM an interesting one. Does the NHS have a functioning computer infrastructure? Has the promise of taxonomies delivered something useful to its intended users?

IBM might be able to help with systems. Will Watson remediate the NHS findability challenges? What will NHS pay to get Dr. Watson on the job? Has anyone involved in the Alphr (a former PC oriented outfit?) used Watson?

I don’t think much happens with these Watson stories than recycling what Watson’s team generates with rather amazing regularity.

Where are the billion dollar plus revenues? That is important to me.

Stephen E Arnold, August 5, 2015

YouTube Wants You to Pay For…YouTube Content?

August 5, 2015

YouTube is free and that is one of the biggest draws for viewers.  Viewers pull the plug on cable and instead watch TV and movies on the Internet or via streaming device.  While YouTube might be free, video streaming services like Hulu, Netflix, and Amazon Prime offer network television for a fraction of the cable price.  Google wants in on the streaming service game and it is already prepped with YouTube.  Google’s only problem is that it does not have major TV networks signed up.  Slash Gear explains in the article that “YouTube’s Upcoming Paid Service Hasn’t Signed Up TV Networks.”  Cheaper access to network TV is one of main reasons that viewers sign up for a video streaming service, without them YouTube has a problem:

“What is most notable, however, is what is missing: TV networks. And according to sources, YouTube hasn’t at this point signed up any of those networks like NBC and Fox. Those networks would bring with them their popular shows, and those popular shows would bring in viewers. That doesn’t mean the networks will never be brought in — sources said there’s still time for them to get on board, as the rollout isn’t pegged for until later this year.”

Google is currently counting on YouTube stars to power the paid platform, which users will be able to watch ad free.  Without network TV, a larger movie library, and other content, paying for YouTube probably will not have many takers.  Why pay for already free videos, when all you have to do is watch a thirty-second ad?

Whitney Grace, August 5, 2015
Sponsored by, publisher of the CyberOSINT monograph




Coauthoring Documents in SharePoint to Save Time

August 4, 2015

SharePoint users are often looking for ways to save time and streamline the process of integration from other programs. Business Management Daily has devoted some attention to the topic with their article, “Co-authoring Documents in SharePoint and Office.” Read on for the full details of how to make the most of this feature.

The article begins:

“One of the best features of SharePoint 2010 and 2013 is the way it permits co-authoring. Co-authoring means more than one person is in a document, workbook or presentation at the same time editing different parts. It works differently in Word, Excel and PowerPoint . . . With Word 2013/SharePoint 2013, co-authors may edit either in Word Online (Word Web App) or the desktop version.”

SharePoint is a powerful but complicated solution that requires quite a bit of energy to maintain and use to the best of its ability. For those users and managers that are tasked with daily work in SharePoint, staying in touch with the latest tips and tricks is vital. Those users may benefit from Stephen E. Arnold’s Web site, A longtime leader in search, Arnold brings the latest SharePoint news together in one easy to digest news feed.

Emily Rae Aldridge, August 4, 2015

Sponsored by, publisher of the CyberOSINT monograph

Hire Watson As Your New Dietitian

August 4, 2015

IBM’s  supercomputer Watson is being “trained” in various fields, such as healthcare, app creation, customer service relations, and creating brand new recipes.  The applications for Watson are possibly endless.  The supercomputer is combining its “skills” from healthcare and recipes by trying its hand at nutrition.  Welltok invented the CaféWell Health Optimization Platform, a PaaS that creates individualized healthcare plans, and it implemented Watson’s big data capabilities to its Healthy Dining CaféWell personal concierge app.  eWeek explains that “Welltok Takes IBM Watson Out To Dinner,” so it can offer clients personalized restaurant menu choices.

” ‘Optimal nutrition is one of the most significant factors in preventing and reversing the majority of our nation’s health conditions, like diabetes, overweight and obesity, heart disease and stroke and Alzheimer’s,’ said Anita Jones-Mueller, president of Healthy Dining, in a statement. ‘Since most Americans eat away from home an average of five times each week and it can be almost impossible to know what to order at restaurants to meet specific health needs, it is very important that wellness and condition management programs empower  smart dining out choices. We applaud Welltok’s leadership in providing a new dimension to healthy restaurant dining through its groundbreaking CaféWell Concierge app.’”

Restaurant menus are very vague when it comes to nutritional information.  When it comes to knowing if something is gluten-free, spicy, or a vegetarian option, the menu will state it, but all other information is missing.  In order to find a restaurant’s nutritional information, you have to hit the Internet and conduct research.  A new law passed will force restaurants to post calorie counts, but that will not include the amount of sugar, sodium, and other information.  People have been making poor eating choices, partially due to the lack of information, if they know what they are eating they can improve their health.  If Watson’s abilities can decrease the US’s waistline, it is for the better.  The bigger challenge would be to get people to use the information.

Whitney Grace, August 4, 2015
Sponsored by, publisher of the CyberOSINT monograph


My Refrigerator Door Shuts Automatically or Content Processing Vendor Works Hard at Repositioning

August 3, 2015

This weekend I checked out the flow of news from several dozen search and content processing vendors. What I discovered was surprising. For example, for the set of 36 vendors, there was zero substantive news about the companies’ information access technology. More disturbing were the hints of revenue difficulties; for example, New Zealand based SLI Systems, a public traded company, continues to lose money. Search and content processing sales challenges are forcing vendors to reposition themselves or align themselves with business trends which are more likely to have traction with senior managers.


How does a semantic technology company adapt. The approach is surprising, and it involves the Internet of Things. This is the push to put a Nest in your home and an Internet node in your appliances. One benefit is energy efficiency. The other idea is increased opportunities to push advertising to the hapless consumer who just wants to nuke a burrito in a microwave (smart of dumb microwave may not matter to a hungry teen).

I am not sure about your refrigerator. My double door General Electric refrigerator (what my grandmother called an “ice box” and some folks call a “fridge”) has doors which shut automatically. The refrigerator has an odd energy efficient sticker like the ones I remove from monitors which persist in going to sleep when my intelligence does not match the gizmo’s.

I understand that someday soon I will have a refrigerator with lots of intelligence. I am confident that with a few moments thought, I can kill that puppy’s brain.

In my narrow world, bounded by gun toting neighbors and dynamite crazed bridge builders, the Internet of Things or the somewhat odd acronym “IoT”, pronounced by my Spanish tutor “Eee ooooh tay”, will be a bit like Big Data, semantic search, natural language processing, artificial intelligence, and data lakes. The idea is that a search and content processing vendor can surf on a hot idea like fraud and pump some air into the sagging balloon labeled sales leads.

I am more convinced of this verbal magic each time I read about “new” technology from companies that are essentially vendors of look up functions applicable to information access.

The IoT is, in my opinion, more about getting information about a machine’s performance, the leasee’s adherence to maintenance schedules, and alerts about highly probably device failure.

One of my neighbors has a Mercedes which beeps, vibrates, and flashes when my neighbor strays across the white lines on the highway. Annoying but semi useful. The Mercedes also can phone home if my neighbor’s big expensive SUV experiences a malfunction. Useful. Maybe annoying if the malfunction occurs when the SUV is parked in front of the local Neiman Marcus or Goodwill store.

I read “Content Analysis and the Internet of Things: Never Leave the Fridge Door Open Again?” The main point of the write up is the question which I already answered. My refrigerator automatically shuts its door.

The article states:

The Internet of Things is the expanding network of physical objects that collect information, communicate and sense or interact with their internal states or the external environment according to Gartner, which reports that there will be nearly 26 billion devices on the Internet of Things by 2020.

Ah, yes, the mid tier firm Gartner, an excellent source of objective, unbiased, inclusion free information.

Here’s the article’s keeper passage I noted from a senior manager at a content processing company. Keep that phrase in mind: “content processing.”

With the common method of interaction, we will speak, devices will read, the design will be predicated upon our needs and less so upon the device. The trend seems so simple—for us to understand these devices, the devices must understand us. The difference is meaning. Data is an abstraction, understanding is communication, and to understand and communicate one must know meaning.

I am delighted that data have meaning. I just wonder how much of a stretch it is to apply text centric methods to outputs from an industrial machine connected to the Internet via an iGear service. My hunch is, “Not too much.”

To me the phrase “content processing” means words, not data output from my neighbor’s flashy Mercedes or an Internet enabled refrigerator.

As I said, my refrigerator door closes automatically. Do I want anyone to know that let the hinges do the work?

Stephen E Arnold, August 3, 2015

Sorry, Experts. NLP and Semantic Technology Will Guarantee Higher Precision and Recall

August 3, 2015

I read “5 Reasons for Developers to Build NLP and Semantic Search Skills” is one of those bait and switch write ups. The title suggests that NLP and semantic search are “skills.” The content of the article presents without factual substantiation assertions about the differences between Web search and enterprise search. The reality is that both are more closely related than they appear to some “experts.” Neither works particularly well for reasons which have to do with cost control, system management, and focus. The technology is, from my point of view, more stable than some search mavens believe.

Here’s the passage I highlighted in pale mauve because I did not have purple:

It at times feels magical that Search engines know, with unbelievable accuracy, exactly what you are looking for. This is the result of a heavy investment in NLP and Semantic technologies. These, along with speech-recognition, have the potential of enabling a future where search will transform into a smart machine that uses “connected knowledge” to answer significantly complex questions – a Star Trek Computer may not be too far away after all, if Amit Singhal – brain behind Google’s search engine evolution, has be to believed.

More remarkable was the introduction of the phrase “big, unstructured data.” I also found the notion of “commoditization” of data science amusing.

One idea warrants comment. The article calls attention to the “widening gap between enterprise search platforms and general purpose search engines.” Anyone who has attempted to index Web content quickly learns that it is a fruit basket which is in the process of being shoved into a blender. The notion of the enterprise search system was to process the content normally found inside an organization. But guess what? After the first query run on a restricted domain of content, the user says, “I need access to Internet content.” The “gap” is one of perception. The underlying components of the system and much of the gee whiz technology are similar. The fact that the Web search systems have been shaped to handle a restricted body of content is lost on some folks. Similarly the enterprise search systems are struggling because they, like Web search engines, cannot handle efficiently and automatically certain types of content. In short, neither works particularly well.

Will NLP and semantic skills help a developer? Not too much if the search system is not focused, the content is not reliable, and functions poorly defined. Forget big data, little data, and unstructured or structured data. Get the basics wrong and one has a lousy search system, which sadly, is more common than not.

Stephen E Arnold, August 3, 2015

« Previous PageNext Page »