How Big Data Is Missing the Mark

January 5, 2016

At this point in the Big Data sensation, many businesses are swimming in data without the means to leverage it effectively. TechWeek Europe cites a recent survey from storage provider Pure Storage in its write-up, “Big Data ‘Fails Businesses’ Due to Access, Skills Shortage.” Interestingly, most of the problems seem to have more to do with human procedures and short-sightedness than any technical shortcomings. Writer Tom Jowitt lists the three top obstacles as a lack of skilled workers, limited access to information, and bureaucracy. He tells us:

“So what exactly is going wrong with Big Data to be causing such problems? Well over half (56 percent) of respondents said bureaucratic red tape was the most serious obstacle for business productivity. ‘Bureaucratic red tape around access to information is preventing companies from using their data to find those unique pieces of insight that lead to great ideas,’ said [Pure Storage’s James] Petter. ‘Data ownership is no longer just the remit of the CIO, the democratisation of insight across businesses enables them to disrupt the competition.’ But regulations are also causing worry, with one in ten of the companies citing data protection concerns as holding up their dissemination of information and data throughout their business. The upcoming EU General Data Protection Regulation will soon affect every single company that stores data.”

The survey reports that missed opportunities have cost businesses billions of pounds per year, and almost three-quarters of respondents say their organizations collect data that is just collecting dust. Both cost and time are reasons that information remains unprocessed. On the other hand, Jowitt points to another survey by CA Technologies; most of its respondents expect the situation to improve, and for their data collections to bring profits down the road. Let us hope they are correct.

 

Cynthia Murrell, January 5, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Rethinking the J.D. As Artificial Intelligence Takes over Lawyers Work

January 5, 2016

The article titled Report: Artificial Intelligence Will Cause “Structural Collapse” of Law Firms by 2030 on Legal Futures posits that AI will take over legal practice in the near future. Jomati Consultants LLP released the report “Civilization 2030: The Near Future for Law Firms” which estimates that as population growth slows, legal work will be directed mainly toward the arena of geriatric advice and litigation. The article states,

“The report’s focus on the future of work contained the most disturbing findings for lawyers… By [2030], ‘bots’ could be doing “low-level knowledge economy work” and soon much more. “Eventually each bot would be able to do the work of a dozen low-level associates. They would not get tired. They would not seek advancement. They would not ask for pay rises. Process legal work would rapidly descend in cost.” The human part of lawyering would shrink.”

The article goes on in great detail about who will be affected. Partners will come out on top (no surprises there) but associates, particularly those doing billable work rather than client-facing work, will be in much less demand. This may be difficult for the hoards of young law school students produced each year as their positions are increasingly taken over by AI technology. Time to rethink that law degree and consider a career path tailored to human skills.

Chelsea Kerwin, January 5, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Google and It 10 Important Moves in 2015

January 4, 2016

short honk: Navigate to “Year in Review: These Were the 10 Most Important Moves That Google Made in 2015.” Scan the list. Notice that none of these “most important moves” involved search. I would suggest that one “important move” be added to the list. The impetus for “the right to be forgotten” gained steam. As a result, it is tough to search for something when the pointer is not in the public index. My take: Google has marginalized precision, recall, and relevance. For those who run queries across multiple systems and perform old fashioned information collection such as talking to folks with some knowledge of a matter, no big deal. For others? Think about it.

Stephen E Arnold, January 4, 2016

Fasten Your Seat Belts: Search Driven Analytics

January 4, 2016

Editor’s Note: ThoughtSpot has no relationship with EMC.

The buzzword meisters are salivating. A term kicked around by folks like Lucidworks (really?) and Radiology Software has been snapped up by EMC. Yep, I know. EMC is not a search vendor, and I was surprised to learn that it was in the analytics business. Hey, that’s what happens when one lives in rural Kentucky.

According to EMC, the “new” concept is the spark behind ThoughtSpot. I learned from “Introducing ThoughtSpot 3: The World’s First Product to Harness Collective Intelligence for Search Driven Analytics”:

ThoughtSpot 3 combines the ease of search with the intelligence of machine learning to deliver a powerful analytic solution that anyone can use to quickly get the right answers out of their data.

Slam dunk. Stock up on EMC shares which are trading in value territory. The company has reported flat revenues and profit margins, but search driven analytics, now in Version 3, is something that makes mid tier consulting firms quiver.

image

Aberdeen allegedly said:

“As the desire for data-driven decisions grows across the business world, there is a greater appetite for people capable of creating data insights,” said Aberdeen Vice President and Principal Analyst Michael Lock. “For companies looking to create insights faster and more easily, early findings from Aberdeen’s latest survey indicate that Best-in-Class organizations are adopting language-driven analytics, for example search-driven analytics and code-free discovery, at a greater rate than lesser performers.”

That’s sufficient for me. Now we just need to watch the revenues of EMC and other vendors almost certain to embrace a buzzword with some rubber left on the 15 inch recap.

Stephen E Arnold, January 4, 2015

Brin in Indonesia to Talk Loon Balloon

January 4, 2016

I read “Sergey Brin visits Indonesia, Talks Project Loon.” The write up included a nifty picture of a Loon balloon.

Google_Loon_-_Launch_Event

I noted this quote attributed to Mr. Brin:

“Players like Go-Jek already have a huge role. Google is very interested in also playing a role in this ecosystem,” said Brin according to CNN.

Mr. Brin apparently did a surprise visit. I love the picture of the Loon balloon/blimp object.

Stephen E Arnold, January 4, 2016

IBM Generates Text Mining Work Flow Diagram

January 4, 2016

I read “Deriving Insight Text Mining and Machine Learning.” This is an article with a specific IBM Web address. The diagram is interesting because it does not explain which steps are automated, which require humans, and which are one of those expensive man-machine processes. When I read about any text related function available from IBM, I think about Watson. You know, IBM’s smart software.

Here’s the diagram:

image

If you find this hard to read, you are not in step with modern design elements. Millennials, I presume, love these faded colors.

Here’s the passage I noted about the important step of “attribute selection.” I interpret attribute selection to mean indexing, entity extraction, and related operations. Because neither human subject matter specialists nor smart software perform this function particularly well, I highlighted in red ink in recognition of IBM’s 14 consecutive quarters of financial underperformance:

Machine learning is closely related to and often overlaps with computational statistics—a discipline that also specializes in prediction-making. It has strong ties to mathematical optimization, which delivers methods, theory and application domains to the field. It is employed in a range of computing tasks where designing and programming explicit algorithms is infeasible. Example applications include spam filtering, optical character recognition (OCR), search engines and computer vision. Text mining takes advantage of machine learning specifically in determining features, reducing dimensionality and removing irrelevant attributes. For example, text mining uses machine learning on sentiment analysis, which is widely applied to reviews and social media for a variety of applications ranging from marketing to customer service. It aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. The attitude may be his or her judgment or evaluation, affective state or the intended emotional communication. Machine learning algorithms in text mining include decision tree learning, association rule learning, artificial neural learning, inductive logic programming, support vector machines, Bayesian networks, genetic algorithms and sparse dictionary learning.

Interesting, but how does this IBM stuff actually work? Who uses it? What’s the payoff from these use cases?

More questions than answers to explain the hard to read diagram, which looks quite a bit like a 1998 Autonomy graphic. I recall being able to read the Autonomy image, however.

Stephen E Arnold, December 30, 2015

Are Search Unicorns Sub Prime Unicorns?

January 4, 2016

The question is a baffler. Navigate to “Sorting Truth from Myth at Technology Unicorns.” If the link is bad or you have to pay to read the article in the Financial Times, pony up, go to the library, or buy hard copy. Don’t complain to me, gentle reader. Publishers are in need of revenue. Now the write up:

The assumption is that a unicorn exists. What exists are firms with massive amounts of venture funding and billion dollar valuations. I know the money is or was real, but the “sub prime unicorn” is a confection from a money thought leader Michael Moritz. A subprime unicorn is a co9mpany “built on the flimsiest of edifices.” Does this mean fairy dust or something more substantial?

According to the write up:

High quality global journalism requires investment. Please share this article with others using the link below, do not cut & paste the article. But the way in which private market valuations have become skewed and inflated as start-ups have delayed IPOs raises questions about the financing of innovation. Despite the excitement, venture capital has produced weak returns in recent decades — only a minority of funds have produced rewards high enough to compensate investors for illiquidity and opacity.

Why would funding start ups perform better than a start up financed by mom, dad, and one’s slightly addled, but friendly, great aunt?

The article then makes a reasonably sane point:

With the rise in US interest rates, the era of ultra-cheap financing is ending. As it does, Silicon Valley’s unicorns are losing their mystique and having to work to raise equity, sometimes at valuations below those they achieved before. The promise of private financing is being tested, and there will be disappointments. It does not pay to be dazzled by mythical beasts.

Let’s think a moment about search and content processing. The mid tier consulting firms—the outfits I call azure chip outfits—have generated some pretty crazy estimates about the market size for search and content processing solutions.

The reality is at odds with these speculative, marketing fueled prognostications. Yep, I would include the wizards at IDC who wanted $3,500 to sell an eight page document with my name on it without my permission. Refresh yourself on the IDC Schubmehl maneuver at this link.

Based on my research, two enterprise search outfits broke $150 million in revenues prior to 2011: Endeca tallied an estimated $150 million in revenues and Autonomy reported $700 million in revenues. Both outfits were sold.

Since 2012 exactly zero enterprise search firms have generated more than $700 million in revenues. Now the wild and crazy funding of search vendors has continued apace since 2012. There are a number of search and retrieval companies and some next generation content processing outfits which have ingested tens of millions of dollars.

How many of these outfits have gone public in the zero cost money environment? Based on my records, zero. Why haven’t Attivio, BA Insight, Coveo, Palantir and others cashed in on their technology, surging revenues, and market demand?

There are three reasons:

  1. The revenues are simply acceptable, not stunning. In the post Fast Search & Transfer era, twiddling the finances carries considerable risks. Think about a guilty decision for a search wizard. Yep, bad.
  2. The technology is a rehash gilded with new jargon. Take a look at the search and content processing systems, and you find the same methods and functions that have been known and in use for more than 30 years. The flashy interfaces are new, but the plumbing still delivers precision and recall which has hit a glass ceiling at 80 to 90 percent accuracy for the top performing systems. Looking for a recipe with good enough relevance is acceptable. Looking for a bad actor with a significant margin for error is not so good.
  3. The smart software performs certain functions at a level comparable to the performance of a subject matter index when certain criteria are met. The notion of human editors riding herd on entity and synonym dictionaries is not one that makes customers weep with joy. Smart software helps with some functions, but today’s systems remain anchored in human operators, and the work these folks have to perform to keep the systems in tip top share is expensive. Think about this human aspect in terms of how Palantir explains architects’ changes to type operators or the role of content intake specialists using the revisioning and similar field operations.

Why do I make this point in the context of unicorns? Search has one or two unicorns. I would suggest Palantir is a unicorn. When I think of Palantir, I consider this item:

To summarize, only a small number of companies reach the IPO stage.

Also, the HP Autonomy “deal” is a quasi unicorn. IBM’s investment in Watson is a potential unicorn if and when IBM releases financial data about his TV show champion.

Then there are a number of search and content processing creatures which could be hybrids of a horse and a donkey. The investors are breeders who hope that the offspring become champions. Long shots all.

The Financial Times’s article expresses a broad concept. The activities of the search and content processing vendors in the next 12 to 18 months will provide useful data about the genetic make up of some technology lab creations.

Stephen E Arnold, January 4, 2015

Klout Identifies Trendy Experts

January 4, 2016

I read “Top Algorithm, Data Science, Big Data, and Machine Learning Experts.” I am not sure what to make of the write up and the information it presents. The “rankings” are derived from an analysis of Klout scores. I am not a Klout person and the notion of having one’s influence rated on a scale of one to 100. The Klout score, it seems, reflects an individual’s influence via or “in” social media.

According to the article, a publication about search engine marketing in in the top five experts in algorithms. I assume this means that many folks get their algorithmic guidance from a marketing oriented publication. A fellow named Vincent Granville, who is pretty good at the Tweeter stuff, is the top expert in Big Data, Data Visualization, Deep Learning, Machine Learning and Statistics. He’s only number 2 in predictive analytics, however.

Interesting. No wonder I have a Klout score of i.

Stephen E Arnold, December 31, 2015

Short Honk: Hadoop Ecosystem Made Clear

January 3, 2016

Love Hadoop. Love all things Hadoopy? You will want to navigate to “The Hadoop Ecosystem Table.” You have categories of Hadoopiness with examples of the Hadoop amoebae. You are able to see where Spark “fits” or Kudu. Need some document data model options? The table will deliver: ArangoDB and more. Useful stuff.

Stephen E Arnold, December 30, 2015

Weekly Watson: In the Real World

January 2, 2016

I want to start off the New Year with look at Watson in the real world. My real world is circumscribed by abandoned coal mines and hollows in rural Kentucky. I am pretty sure this real world is not the real world assumed in “IBM Watson: AI for the Real World.” IBM has tapped Bob Dylan, a TV game show, and odd duck quasi chemical symbols to communicate the importance of search and content processing.

The write up takes a different approach. In fact, the article begins with an interesting comment:

Computers are stupid.

There you go. A snazzy one liner.

The purpose of the reminder that a man made device is not quite the same as one’s faithful boxer dog or next door neighbor’s teen is startling.

The article summarizes an interview with a Watson wizard, Steven Abrams, director of technology for the Watson Ecosystem. This is one of those PR inspired outputs which I quite enjoy.

The write up quotes Abrams as saying:

“You debug Watson’s system by asking, ‘Did we give it the right data?'” Abrams said. “Is the data and experience complete enough?”

Okay, but isn’t this Dr. Mike Lynch’s approach. Lynch, as you may recall, was the Cambridge University wizard who was among the first to commercialize “learning” systems in the 1990s.

According to the write up:

Developers will have data sets they can “feed” Watson through one of over 30 APIs. Some of them are based on XML or JSON. Developers familiar with those formats will know how to interact with Watson, he [Abrams] explained.

As those who have used the 25 year old Autonomy IDOL system know, preparing the training data takes a bit of effort. Then as the content from current content is fed into the Autonomy IDOL system, the humans have to keep an eye on the indexing. Ignore the system too long, and the indexing “drifts”; that is, the learned content is not in tune with the current content processed by the system. Sure, algorithms attempt to keep the calibrations precise, but there is that annoying and inevitable “drift.”

IBM’s system, which strikes me as a modification of the Autonomy IDOL approach with a touch of Palantir analytics stirred in is likely to be one expensive puppy to groom for the dog show ring.

The article profiles the efforts of a couple of IBM “partners” to make Watson useful for the “real” world. But the snip I circled in IBM red-ink red was this one:

But Watson should not be mistaken for HAL. “Watson will not initiate conduct on its own,” IBM’s Abrams pointed out. “Watson does not have ambition. It has no objective to respond outside a query.” “With no individual initiative, it has no way of going out of control,” he continued. “Watson has a plug,” he quipped. It can be disconnected. “Watson is not going to be applied without individual judgment … The final decision in any Watson solution … will always be [made by] a human, being based on information they got from Watson.”

My hunch is that Watson will require considerable human attention. But it may perform best on a TV show or in a motion picture where post production can smooth out the rough edges.

Maybe entertainment is “real”, not the world of a Harrod’s Creek hollow.

Stephen E Arnold, January 2, 2016

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta