Quid Cheerleading: The Future of Search

March 4, 2016

I read “The Future of {Re}search.” (I love the curly braces.) The write up identifies the four big things in information access. Keep in mind that the write up is a rah rah for Quid, which is okay.

Here are the main points:

  • Semantic search is the next big thing
  • Visualization matters
  • Humans are part of the search process
  • Bots are the “Future of Search.” (The capitalization is from the source document.)

Quid is an interesting company. I thought that the firm was focused on analytics and nifty visualizations. Their catchphrase is “intelligence amplified,” which strikes me as similar to Palantir’s “augmented intelligence.”

If the write up is on the money, Quid is a search vendor in the same way Palantir Technologies is a search vendor.

The point about bots may catch the attention of the ever-alert Connotate folks. I think bots has been an important part of that firm’s services for many years.

So, “the next big thing”? Well, sort of.

Stephen E Arnold, March 4, 2016

Hershey Chocolate: Semi Sweet Analytics?

March 4, 2016

I am wrapping up my profile of Palantir Technologies. I located a couple of references to Palantir’s activities in the non-government markets. One of the outfits allegedly swooned by the Hobbits was Hershey chocolate. A typical reference to the Hobbits and Kisses folks was “Hershey Turns Kisses and Hugs into Hard Data.”

image

When I read “The Hershey Company Partners with Infosys to Build Predictive Analytics Capability using Open Source Information Platform on Amazon Web Services,” I wondered why Palantir Technologies was not featured in the write up. Praescient Analytics, near Washington, DC, can plug industrial strength predictive analytics like Recorded Future’s into a Metropolitan installation without much hassle.

The write up makes clear that the chocolate outfit is going a new way. The path leads through Amazon Web Services to the Infosys Information Platform.

I find this quite a surprise. I have no doubt that Infosys has some competent folks on its team. But the questions flashing through my mind are:

  • What’s up with the Palantir system?
  • Why jump to Infosys when there are darned good outfits available in Boston and Washington, DC?
  • What’s an outsourcing firm able to deliver that specialists with deep experience in making sense of data cannot?

I never understood Mars, and now I don’t understand the makers of the York Peppermint Patty.

Perhaps this is a “whopper” of a project?

Stephen E Arnold, March 4, 2016

Yahoo Has AI Advantage Maybe?

March 2, 2016

I read “Don’t Laugh: Yahoo’s Open Source AI Has a Secret Weapon.” Sorry, I did laugh. I find the Yahooligans’ periodic “we’re really good at technology” messages amusing. More interesting is the willingness of with it magazines to cover these breakthroughs.

I learned:

Yahoo published the source code to its CaffeOnSpark AI engine so that anyone from academic researchers to big corporations can use or modify it.

Good. Open source software is useful, very useful.

I noted this passage:

Yahoo, for example, uses it to improve search results on Flickr by determining the contents of different photos. Instead of relying on the descriptions and keywords entered by the people who upload photos to the site, Yahoo teaches its computers to recognize certain characteristics of a photo, such as specific colors or even objects and animals.

Interesting, but other outfits do image recognition reasonably well. Check out Yandex’s image search or look at the wonky similar images feature that makes it oh, so easy for me to lose my train of thought when looking for examples of Palantir’s interface via Google’s image search service.

I learned:

CaffeOnSpark, as the name suggests, combines two existing technologies: the popular deep learning framework Caffe and the up-and-coming data-crunching system Spark that can run on top of the even more popular big data platform Hadoop. What Yahoo did was simply create a way to run Caffee atop Spark clusters. It can be run either on Spark alone or atop Hadoop. Besides making it easy for AI developers to use familiar tools and avoid moving data around… CaffeOnSpark also makes it relatively easy to distribute deep learning processes across multiple servers, something that the open source version of Google’s TensorFlow can’t yet do.

The challenge for Yahoo is to deal with its here and now problems. The outfit is for sale and many of the researchers of yesteryear have ridden off into the sunrise to find companies able to generate revenue from innovations.

When you are for sale, publicity is a definite plus. By the way, companies with technology to distribute deep learning across multiple servers are chugging along and closing some deals based on their know how. When does open source become a source of revenue and when is it a PR play?

Stephen E Arnold, March 2, 2016

Analytics Reality: Do You Excel?

February 24, 2016

I read “What Is the Most used Feature in Any business Intelligence Solution? It’s the Export to Excel Button.” The write up asserts:

I was recently forwarded an article on the continued popularity of Excel in the BI community consisting of quotes from 27 experts saying how great and how relevant Excel remains. We do categorize BI as static and historical as opposed to forward looking predictive analytics but I bet it’s still true that Excel is a very widely used tool even by folks that categorize themselves as data scientists.

Let’s assume this is accurate. What does this suggest for complex analytics like my old pals SAS or IBM SPSS? What about high flying outfits like Palantir, and Centrifuge Systems?

I have some answers, but I think the questions are suggestive of a hurdle which high horsepower analytic systems must power around. There is a reason so few folks are adept at statistics whether the industrial strength variety or the weird approach taken in social science and economics classes.

Excel seems to be tough to master but compared to more supercharged methods, Excel sure looks like a push peddle tricycle. You can’t go too far or too fast. If you crash into something, there is F1 and semi automated procedures to kiss the boo boo and make it better.

Stephen E Arnold, February 24, 2016

SAP: Statistics Need Sizzle

February 22, 2016

The underlying data? Important, yeah, but the action is Hollywood style graphics. Taking a page from the Palantir game plan, SAP is getting with the visual sizzle program. Navigate to “SAP Buys All the Pretty Data Firm Roambi.” The article states:

The data prettifier’s angle is it that displays data using deliciously slick and dynamically updating charts, graphs and sliders that are native apps for iOS and Android. Roambi’s front ends tap into back ends including Excel, SQL Server, Cognos, Box, Salesforce and – yes – SAP.

Special effects matter in videos, Web pages, and business analytics.

What if the analyst gets the underlying data out of joint? What if the person using the graphic output does not understand what analytic choices were made to give the visual some zing?

What? Who worries about details? It is the visual snap that crackles.

Stephen E Arnold, February 22, 2016

Unicorn Valuations and Paybacks

February 18, 2016

I read “The Terms behind Unicorn Valuations.” Valentine’s Day seemed to be an ideal time to think about billion dollar outfits and those who fund them. Note: I did not get a valentine this year.

The law firm generating the document is a specialist in unicorn analysis. The most recent report represents 2015 data. The major takeaway in my opinion is that 2015 was the Year of the Unicorn. With the Chinese New Year in mind, 2016 is the Year of the Monkey as I recall.

I noted that some investors in unicorns get “downside protection.” I interpreted this to suggest that some folks have a shot at getting some of the money back if their unicorn catches pneumonia or worse. I like the discovery that flexibility may have been used to achieve the $1 billion valuation.

For me the key finding was:

the beginning of the period covered by the survey was markedly stronger than the end of the period covered by the survey.

The economic uncertainty may have been a factor. The report does not dip into psycho socio economic reasons for the dip.

The write up concludes:

companies in need of additional funds might find it necessary to provide new investors liquidation or other rights superior to their unicorn (and other) investors to attract needed capital. This happened frequently when the dotcom bubble burst in the early 2000s. Although the use of structures that reduce or eliminate outstanding investor rights is uncommon during most of the venture cycle, they become more common during significant downturns in the venture economy.????

What about the financing of search and content processing vendors? Only one is a unicorn, Palantir. Worth watching.

Net net: No valentines for those who get bitten by a unicorn.

Stephen E Arnold, February 18, 2016

Wrangle That Data: Trifacta Receives $35 Million

February 14, 2016

When I read “Data Cleaning Software Company Trifacta Raises $35 Million,” I realized that the notion of automating the clean up of disparate data was an unsolved problem. Odd. I have been operating on the assumption that tools from Lexmark Kapow and Palantir had tamed that stallion years ago. Wrong.

According to the write up:

New investor Cathay Innovation and existing investors Accel Partners, Greylock Partners, and Ignition Partners participated in the new round. To date, the company has raised more than $76 million, including the $25 million round announced May 2014.

That’s a reasonable chunk of change for a function many search and content processing vendors suggest is a no brainer. Trifacta has a pocketful of cash to provide some evidence that the belief that cleaning up data remains a big, big problem.

Will Trifacta surge to the top of the data clean up pile. If one takes a peek at the azure chip consulting firm reports on this housekeeping sector, there are quite a few vendors chasing customers in this sector.

Now returning to the question about incumbents like Kapow and Palantir. Where are these companies? I can understand why Kapow has slipped from some folks’ radar, but the Palantir operation is active in the commercial sector and seems to have helpers, wizards, and smart software which allows a person with little or no training to import, process, and extract insights from disparate data.

Do those funding Trifacta perceive Kapow and Palantir as companies unable or unwilling to tackle the problems Trifacta addresses? Good question.

Stephen E Arnold, February 14, 2016

 

A Data Lake: Batch Job Dipping Only

February 11, 2016

I love the Hadoop data lake concept. I live in a mostly real time world. The “batch” approach reminds me of my first exposure to computing in 1962. Real time? Give me a break. Hadoop reminded me of those early days. Fun. Standing on line. Waiting and waiting.

I read “Data Lake: Save Me More Money vs. Make Me More Money.” The article strikes me as a conference presentation illustrated with a deck of PowerPoint goodies.

One of the visuals was a modern big data analytics environment. I have seen a number of representations of today’s big data yadda yadda set ups. Here’s the EMC take on the modernity:

image

Straight away, I note the “all” word. Yep, just put the categorical affirmative into a Hadoop data lake. Don’t forget the video, the wonky stuff in the graphics department, the engineering drawings, and the most recent version of the merger documents requested by a team of government investigators, attorneys, and a pesky solicitor from some small European Community committee. “All” means all, right?

Then there are two “environments”. Okay, a data lake can have ecosystems, so the word environment is okay for flora and fauna. I think the notion is to build two separate analytic subsystems. Interesting approach, but there are platforms which offer applications to handle most of the data slap about work. Why not license one of those; for example, Palantir, Recorded Future?

And that’s it?

Well, no. The write up states that the approach will “save me more money.” In fact, one does not need much more:

The savings from these “Save me more money” activities can be nice with a Return on Investment (ROI) typically in the 10% to 20% range. But if organizations stop there, then they are leaving the 5x to 10x ROI projects on the table. Do I have your attention now?

My answer, “No, no, you do not.”

Stephen E Arnold, February

Microsoft AI Faves

February 9, 2016

I noted a blog post called “From Discovery to Selection: Announcing the Seattle Accelerator’s Third Batch.” The post lists companies which Microsoft wants to nurture. Here’s the list:

  • Affinio: Audience insights
  • Agolo: Summarization of text
  • Clarify: Rich media search
  • Defined Crowd: Natural language processing
  • Knomos: Palantir style analysis
  • Medwhat: Doctor made of soft software
  • OneBridge: Middleware for Microsoft cloud
  • Percolata: Retail staff monitoring
  • Plexuss: Palantir style analysis
  • Sim Machines: Similarity search and pattern recognition

Net net: Microsoft continues to hunt for solutions in search and analytics. There is a touch of “me too” in the niche plays too. Persistence is a virtue.

Stephen E Arnold, February 9, 2016

3RDi for Enterprise Search

February 5, 2016

Health and medical search need an upgrade? T/DG 3RDi might be just what the doctor ordered. You search blues will disappear when you have natural language processing, semantic search, search relevancy, search analytics, research tools, and data integration. Very comprehensive it seems.

T/DG offers 3RDi. Now try to search for these entities. To locate the services firm offering the 3RDi system, one has to figure out how to make Bing, Google, and Yandex point to the correct entities.

Naming products and companies is tricky. Let me save you the hassle of wading through false drops.

  • T/DG means “The Digital Group,” an outfit founded in 1999 and operating from New Jersey.
  • 3RDi means “relevant, deep insights.” (I don’t know what the 3 means.)

The search system appears to be a “platform” based on open source technology. Here’s a block diagram of 3RDi:

image

Source: The Digital Group, 2015

The company’s most recent push is health care. The search system performs the type of functions which I associate with a system like the ones Autonomy and Fast Search & Transfer described in the late 1990s. There is also a hefty dose of “platformitis.” The idea is that a licensee can use the system to meet the needs of users. The support for controlled vocabularies is helpful in domain specific deployments, but these have to be maintained, which can be a financial and resource burden for some licensees.

3RDi embraces the semantic marketing jargon enthusiastically; for example, this diagram shows how “knowledge” and “semantics” make the “experience” work for licensees:

image

Source: The Digital Group, 2015

Users of the system do not have to deal with results lists. The system presents information in a visual manner; for example:

image

Source: The Digital Group, 2015

In short, 3RDi appears to deliver the type of utility I associate with systems from outfits like BAE Systems and Palantir.

If your organization wants an open source system with the bells and whistles found in seven figure platforms, you may want to explore 3RDi.

The urls you need are:

I assume that the company will make the “3” clearer going forward. There is a live demo available. You will need to register. The system balks at non commercial domains like my Yahoo account.

The recent marketing push given 3RDi signals that the enterprise search sector is alive and well. As the company says, “Start experiencing.” I wonder what the “3” means.

Stephen E Arnold, February 5, 2016

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta