My Refrigerator Door Shuts Automatically or Content Processing Vendor Works Hard at Repositioning
August 3, 2015
This weekend I checked out the flow of news from several dozen search and content processing vendors. What I discovered was surprising. For example, for the set of 36 vendors, there was zero substantive news about the companies’ information access technology. More disturbing were the hints of revenue difficulties; for example, New Zealand based SLI Systems, a public traded company, continues to lose money. Search and content processing sales challenges are forcing vendors to reposition themselves or align themselves with business trends which are more likely to have traction with senior managers.
How does a semantic technology company adapt. The approach is surprising, and it involves the Internet of Things. This is the push to put a Nest in your home and an Internet node in your appliances. One benefit is energy efficiency. The other idea is increased opportunities to push advertising to the hapless consumer who just wants to nuke a burrito in a microwave (smart of dumb microwave may not matter to a hungry teen).
I am not sure about your refrigerator. My double door General Electric refrigerator (what my grandmother called an “ice box” and some folks call a “fridge”) has doors which shut automatically. The refrigerator has an odd energy efficient sticker like the ones I remove from monitors which persist in going to sleep when my intelligence does not match the gizmo’s.
I understand that someday soon I will have a refrigerator with lots of intelligence. I am confident that with a few moments thought, I can kill that puppy’s brain.
In my narrow world, bounded by gun toting neighbors and dynamite crazed bridge builders, the Internet of Things or the somewhat odd acronym “IoT”, pronounced by my Spanish tutor “Eee ooooh tay”, will be a bit like Big Data, semantic search, natural language processing, artificial intelligence, and data lakes. The idea is that a search and content processing vendor can surf on a hot idea like fraud and pump some air into the sagging balloon labeled sales leads.
I am more convinced of this verbal magic each time I read about “new” technology from companies that are essentially vendors of look up functions applicable to information access.
The IoT is, in my opinion, more about getting information about a machine’s performance, the leasee’s adherence to maintenance schedules, and alerts about highly probably device failure.
One of my neighbors has a Mercedes which beeps, vibrates, and flashes when my neighbor strays across the white lines on the highway. Annoying but semi useful. The Mercedes also can phone home if my neighbor’s big expensive SUV experiences a malfunction. Useful. Maybe annoying if the malfunction occurs when the SUV is parked in front of the local Neiman Marcus or Goodwill store.
I read “Content Analysis and the Internet of Things: Never Leave the Fridge Door Open Again?” The main point of the write up is the question which I already answered. My refrigerator automatically shuts its door.
The article states:
The Internet of Things is the expanding network of physical objects that collect information, communicate and sense or interact with their internal states or the external environment according to Gartner, which reports that there will be nearly 26 billion devices on the Internet of Things by 2020.
Ah, yes, the mid tier firm Gartner, an excellent source of objective, unbiased, inclusion free information.
Here’s the article’s keeper passage I noted from a senior manager at a content processing company. Keep that phrase in mind: “content processing.”
With the common method of interaction, we will speak, devices will read, the design will be predicated upon our needs and less so upon the device. The trend seems so simple—for us to understand these devices, the devices must understand us. The difference is meaning. Data is an abstraction, understanding is communication, and to understand and communicate one must know meaning.
I am delighted that data have meaning. I just wonder how much of a stretch it is to apply text centric methods to outputs from an industrial machine connected to the Internet via an iGear service. My hunch is, “Not too much.”
To me the phrase “content processing” means words, not data output from my neighbor’s flashy Mercedes or an Internet enabled refrigerator.
As I said, my refrigerator door closes automatically. Do I want anyone to know that let the hinges do the work?
Stephen E Arnold, August 3, 2015
Data Science, Senior Managers, and the Ever Interesting Notion of Truth
August 3, 2015
I read “Data Scientists to CEOs: You Can’t Handle the Truth.” I enjoy write ups about data science which start off with the notion of truth. I know that the “truth” referenced is the outputs of analytics systems.
Call me skeptical. If the underlying data are not normalized, validated, and timely, the likelihood of truth becomes even murkier than it was in my college philosophy class. Roger Ailes allegedly said:
Truth is whatever people will believe.
Toss in the criticism of a senior manager who in the US is probably a lawyer or an accountant, and you have a foul brew. Why would a manager charged with hitting quarterly targets or generating enough money to meet payroll quiver with excitement when a data scientist presents “truth.”
There is that pesky perception thing. There are frames of reference. There are subjective factors in play. Think of the dentist who killed Cecil. I am not sure data science will solve his business and personal challenges. Do you?
The write up is a silly fan rant for the fuzzy discipline of data science. Data science does not pivot on good old statisticians with their love of SAS and SPSS, fancy math, and 17th century notions of what constitutes a valid data set. Nope.
The data scientist has to communicate the known unknowns to his or her CEO. Shades of Rumsfeld. Does today’s CEO want to know more about the uncertainty in the business? The answer is, “Maybe.” But senior managers often get information that is filtered, shaped, and presented to create an illusion. Shattering those illusions can have some negative career consequences even for data scientists, assuming there is such a discipline as data science.
Evoking the truth from statistical processes which are output from system configured by others can be interesting. Those threshold settings are not theoretical. Those settings determine what the outputs are and what they are “about.”
Connecting an automated output to something that the data scientist asserts should be changed strikes me as somewhat parental. How does that work on a manager like Dick Cheney? How does that work on the manager of a volunteer committee working on a parent teacher luncheon?
I thought the Jack Benny program from the 1930s to 1960s was amusing. Some of the output about data science suggests that comedy may be a more welcoming profession than management based on truth from data science. Truth and statistics. Amazing comedy.
Stephen E Arnold, August 3, 2015
Sorry, Experts. NLP and Semantic Technology Will Guarantee Higher Precision and Recall
August 3, 2015
I read “5 Reasons for Developers to Build NLP and Semantic Search Skills” is one of those bait and switch write ups. The title suggests that NLP and semantic search are “skills.” The content of the article presents without factual substantiation assertions about the differences between Web search and enterprise search. The reality is that both are more closely related than they appear to some “experts.” Neither works particularly well for reasons which have to do with cost control, system management, and focus. The technology is, from my point of view, more stable than some search mavens believe.
Here’s the passage I highlighted in pale mauve because I did not have purple:
It at times feels magical that Search engines know, with unbelievable accuracy, exactly what you are looking for. This is the result of a heavy investment in NLP and Semantic technologies. These, along with speech-recognition, have the potential of enabling a future where search will transform into a smart machine that uses “connected knowledge” to answer significantly complex questions – a Star Trek Computer may not be too far away after all, if Amit Singhal – brain behind Google’s search engine evolution, has be to believed.
More remarkable was the introduction of the phrase “big, unstructured data.” I also found the notion of “commoditization” of data science amusing.
One idea warrants comment. The article calls attention to the “widening gap between enterprise search platforms and general purpose search engines.” Anyone who has attempted to index Web content quickly learns that it is a fruit basket which is in the process of being shoved into a blender. The notion of the enterprise search system was to process the content normally found inside an organization. But guess what? After the first query run on a restricted domain of content, the user says, “I need access to Internet content.” The “gap” is one of perception. The underlying components of the system and much of the gee whiz technology are similar. The fact that the Web search systems have been shaped to handle a restricted body of content is lost on some folks. Similarly the enterprise search systems are struggling because they, like Web search engines, cannot handle efficiently and automatically certain types of content. In short, neither works particularly well.
Will NLP and semantic skills help a developer? Not too much if the search system is not focused, the content is not reliable, and functions poorly defined. Forget big data, little data, and unstructured or structured data. Get the basics wrong and one has a lousy search system, which sadly, is more common than not.
Stephen E Arnold, August 3, 2015
Bodleian Library Gets Image Search
August 3, 2015
There is a lot of free information on the Internet, but the veracity is always in question. While libraries are still the gateway of knowledge, many of their rarer, more historic works are buried in archives. These collections offer a wealth of information that is often very interesting. The biggest problem is that libraries often lack the funds to scan archival collections and create a digital library. Oxford University’s Bodleian Library, one of the oldest libraries in Europe, has the benefit of funds and an excellent collection to share with the world.
Digital Bodleian boasts over 115,179 images as of writing this article, stating that it is constantly updating the collection. The online library takes a modern approach to how users interact with the images by taking tips from social media. Not only can users browse and search the images randomly or in the pre-sorted collections, they can also create their own custom libraries and sharing the libraries with friends.
It is a bold move for a library, especially for one as renowned as Bodleian, to embrace a digital collection as well as offering a social media-like service. In my experience, digital library collections are bogged down by copyright, incomplete indices or ontologies, and they lack images to perk a users’ interest. Digital Bodleian is the opposite of many of its sister archives, but another thing I have noticed is that users are not too keen on joining a library social media site. It means having to sign up for yet another service and also their friends probably aren’t on it.
Here is an idea, how about a historical social media site similar to Pinterest that pulls records from official library archives? It would offer the ability to see the actual items, verify information, and even yield those clickbait top ten lists.
Whitney Grace, August 3, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Online Ads Discriminate
August 3, 2015
In our modern age, discrimination is supposed to be a thing of the past. When it does appear, people take to the Internet to vent their rage and frustrations, eager to point out this illegal activity. Online ads, however, lack human intelligence and are only as smart as their programmed algorithm. Technology Review explains in “Probing The Dark Side of Google’s Ad-Targeting System” that Google’s ad service makes inaccurate decisions when it comes to gender and other personal information.
A research team at Carnegie Mellon University and the International Computer Science Institute built AdFisher, a tool to track targeted third party ads on Google. AdFisher found that ads were discriminating against female users. Google offers a transparency tool that allows users to select what types of ads appear on their browsers, but even if you use the tool it doesn’t stop some of your personal information from being used.
“What exactly caused those specific patterns is unclear, because Google’s ad-serving system is very complex. Google uses its data to target ads, but ad buyers can make some decisions about demographics of interest and can also use their own data sources on people’s online activity to do additional targeting for certain kinds of ads. Nor do the examples breach any specific privacy rules—although Google policy forbids targeting on the basis of “health conditions.” Still, says Anupam Datta, an associate professor at Carnegie Mellon University who helped develop AdFisher, they show the need for tools that uncover how online ad companies differentiate between people.”
The transparency tool only controls some of the ads and third parties can use their own tools to extract data. Google stands by its transparency tool and even offers users the option to opt-out of ads. Google is studying AdFisher’s results and seeing what the implications are.
The study shows that personal data spills out on the Internet every time we click a link or use a browser. It is frightening how the data can be used and even hurtful if interpreted incorrectly by ads. The bigger question is not how retailers and Google uses the data, but how do government agencies and other institutes plan to use it?
Whitney Grace, August 3, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Big Data Lake: Are the Data Safe to Consume?
August 2, 2015
I read “The Analytics Journey Leading to the Business Data Lake.” Data lake is one of the terms floating around (pun definitely intended!) to stimulate sales. If one has a great deal of water, one needs a place to put it. Even though water is dammed, piped, used, recycled, and dumped—storage is the key.
Enter EMC, a company which is in the business of helping those with water store it and make use of that substance.
The write up reflects effort. I assume there was a PowerPoint slide deck in the mix. There are some snazzy graphics. Here’s one that caught my eye:
Instead of enterprise search being the go-to enterprise software solution, EMC has slugged in the following umbrella terms:
- Information ecosystem
- Business intelligence (perhaps an oxymoron in light of this article)
- Advanced analytics (obviously because regular analytics just are zippy enough)
- Knowledge layer (I remain puzzled about knowledge because I have a tough time defining. In fact, I resigned from my for fee knowledge management column because I just don’t know what the heck “knowledge” means.)
- The unfathomable data lake (yep, pun intended). What’s wrong with the word “storage” or “database” by the way?
- Master data which is also baffling. Is there servant data too?
- Machine data. Again I have no clue what this means.
The chart scatters undefined and fuzzy buzzwords like a crazed Jethro Tull, a water soluble blend of Jethro Tull (inventor of the seed drill) and Jethro Tull (the commercially successful and eccentric rock bands).
The write up is important because EMC has sucked in the jargon and assertions once associated with enterprise search and applied them to the dark and mysterious data lake.
I highlighted:
Our data lake is one logical data platform with multiple tiers of performance and storage levels to optimally serve various data needs based on Service Level Agreements (SLA). It will provide a vast amount of structured and unstructured data at the Hadoop and Greenplum layers to data scientists for advanced analytics innovation. The higher performance levels powered by Greenplum and in-memory caching databases will serve mission-critical and real-time analytics and application solutions. With more robust data governance and data quality management, we can ensure authoritative, high-quality data driving all of EMC business insights and analytics driven applications using data services from the lake.
Ah, the Mariana Trench of enterprise information: Governance. Like “knowledge” and “advanced analytics”, governance has euphony. I think of the water lapping against the shore of Lake Paseco.
So what? Several observations:
- This type of “suggest lots” marketing ended poorly for a number of companies who used this type of rhetoric when marketing search
- The folks who swallow this bait are likely to find themselves in a most uncomfortable spot
- The problems associated with making use of information to improve decision making by reducing risk are not going to be solved by crazy diagrams and unsupported assertions.
EMC has been able to return revenue growth. But the company’s profit margin has flat lined.
I am not sure that increasing the buzzword density in marketing write ups will help angle the red lines to low earth orbit. With better margins, it is much easier to check out the topographic view and see where lakes meet land.
Stephen E Arnold, August 2, 2015
Surprise! MBAs Do Not Make Use of Competitive Intelligence
August 2, 2015
I read “Companies Collect Competitive Intelligence but Don’t Use It.” The author, Ben Gilad, is a level headed person. His view is:
the competitive perspective is almost always the least important aspect in managerial decision-making. Internal operational issues including execution, budgets, and deadlines are paramount in a company’s deliberation, but what other players will do is hardly ever in focus. This “island mentality” is surprisingly prevalent among talented, seasoned managers.
What’s the fix?
Gilad seems to realize the magnitude of the challenge. He states:
a company can’t force its managers to use information optimally. It can, however, ensure they at least consider it. In many areas of the corporation, mandatory reviews are routine- regulatory, legal, financial reviews are considered the norm. Ironically, competitive reviews are not, even though the cost of missing out on understanding the competitive environment can be enormous.
In short, MBAs talk the way they learned in Harvard-type business schools. The walk, on the other hand, is different.
From my point of view, biased by my work at Booz, Allen & Hamilton before it became the two separate outfits Booz and Booz, Allen, I hear a different drum cadence.
- Managers are unable to deal effectively with available information. As a result, many are emulating the leatherback sea turtle. Shutting down and making decisions based on what other turtles say is the preferred course of action.
- A number of MBAs shift the discussion to data. The notion that competitive insights may be based on inputs which are tough to quantify is sufficient evidence to accept the outputs of an Excel spreadsheet or some canned analysis ginned up by an intern at a mid tier consulting firm.
- Quite a few senior managers, in my experience, live in a state of fear. The happy attitude and rah rah, go team approach is like a coat of drive through car wax. Beneath the surface, there is real concern about keeping a job, dealing with life’s little challenges, and being able to pull off another Board meeting.
Competitive intelligence, like business intelligence and military intelligence, get quite a bit of marketing attention. But in today’s business environment, turtles, data addicts, and cheerleaders stumble with basics.
The evidence falls readily to hand: Security woes at government agencies, fumbling with immigrants in Calais, automobiles which can be hacked, and enterprise search systems which cannot locate information.
From my point of view, the problem is cross cultural and deeper than competitive intelligence. Executives struggle with strategy, planning, and personal conduct too.
Perhaps business schools and management experts are not symptoms but triggers?
Stephen E Arnold, August 2, 2015
Elasticsearch: A Useful Overview
August 1, 2015
Want to shake free of the proprietary search and retrieval systems? I don’t blame you. Irregular and slow bug fixes and licensing handcuffs are two good reasons. Remember: The cost of search is not the licensing fee. The cost is a collection of fees, purchases, and expenses which every search system with which I am familiar is burdened.
Elasticsearch is the go to solution at this time in my opinion. If you want a useful overview of Elasticsearch, check out the Slideshare presentation “Introduction to ElasticSearch.” You may have to “join” LinkedIn / Slideshare to do anything useful, however.
The deck was prepared / delivered in the spring of 2015 by Roy Russo who is affiliated with or is “DevNexus.” The information is jargon free, an approach which the whiz kids at LucidWorks (Really?) may want to imitate. The presentation does contain a couple of buzzwords like NGram, but no MBA speak.
Stephen E Arnold, August 1, 2015
Endeca: Facets of Novelty
August 1, 2015
I am no specialist in the arcane art of legal eagle spotting. I did notice some references to a dust up between an outfit called Speedtrack and licensees of Endeca’s ageing search technology.
The Speedtrack outfit seems to have rights to an invention called “Method for Accessing Computer Files and Data, Using Linked Categories Assigned to Each Data File Record on Entry of the Data File Record.” This is explained brilliantly in US5544360, filed in February 1995.
Here’s a diagram showing how the user can click on categories to locate information. No typing required.
Compare this to Endeca’s invention, “Hierarchical Data Driven Navigation System and Method for Information Retrieval.” This is US7062483, filed in 2001. You may also find US7035864 and US7325201 interesting as well.
“Federal Circuit Reaffirms Kessler Doctrine As A Patent Infringement Defense For Customers” explains that the Speedtrack infringement case pivots on the Kessler doctrine. Here’s the explanation from the JDSupra.com article:
First, unlike res judicata, which is a defense that is personal to the parties in a prior litigation, the Kessler Doctrine “attaches to the [accused] product itself” and precludes a patentee from reasserting the same patent against the same (or “essentially the same”) product in a subsequent action.
Then noted:
Second, the Federal Circuit ruled that the Kessler doctrine may be raised by customers as well as the product manufacturer or supplier.
What I found fascinating was this infringement related statement attributed to the presiding legal eagle:
Third, the Federal Circuit held that the Kessler doctrine applied to Speedtrack’s claim even though the Endeca software allegedly infringed only when combined with the customer’s own computer hardware.
I recall that Endeca’s faceted navigation burst upon the scene in the late 1990s. Who knew that Jerzy Lewak (co founder of Speedtrack), Slawek Grzechnik, and Jon Matousek seemed to be trying to figure out a way around the problem of keyword search before Endeca?
I wonder if Oracle were surprised too. I have a hunch Speedtrack was.
Stephen E Arnold, August 1, 2015
Darktrace: A Kin of Kinjin?
August 1, 2015
Many years ago I loaded a software application from Autonomy. The application watched what I was “doing” and automatically displayed search results sort of relevant to what the software thought I was writing.
Flash forward to now. I read “Mike Lynch’s Cyber security Startup Darktrace Valued at More than £60m.” The point of the write up is that Dr. Mike Lynch has what looks like another success in his digital Bialette k6857 Mocha Express machine.
Darktrace monitors digital flows for signals. Instead of displaying search results, the system alerts security officers of a probable issue. Maybe Kinjin is not the influencer of the system. No matter. The company is “valued at more than $100 million.”
Several observations:
- The Hewlett Packard Autonomy hassle has not spoiled Dr. Lynch’s coffee
- Dr. Lynch is once again moving into a market sector in which some of the competitors are likely to be unaware of Dr. Lynch’s electric powered kitchen appliance taking over their coffee machine.
- Hewlett Packard may want to ask and answer: “Why did we lose this fellow?”
My hunch is that HP won’t ask the question and may not admit that the answer is not just technology. The murky world of management spoils and otherwise pristine cup of java. That’s a $100 million dollar cup of joe.
Stephen E Arnold, August 1, 2015