CyberOSINT banner

IBM: Back to Its Roots with Zest, Actually Spark

April 6, 2016

I read “IBM Launches Mainframe Platform for Spark.” This is an announcement which makes sense to me. The Watson baloney annoys; the mainframe news thrills.

According to the write up:

IBM is expanding its embrace of Apache Spark with the release of a mainframe platform that would allow the emerging open-source analytics framework to run natively on the company’s mainframe operating system.

I noted this passage as well:

The IBM platform also seeks to leverage Spark’s in-memory processing approach to crunching data. Hence, the z Systems platform includes data abstraction and integration services so that z/OS analytics applications can leverage standard Spark APIs. That approach eliminates processing and security issues associated with ETL while allowing organizations to analyze data in-place.

Hopefully IBM will play to its strengths not chase rainbows.

Stephen E Arnold, April 6, 2016

Big Data and Its Fry Cooks Who Clean the Grill

April 1, 2016

I read “Clearing Big Data: Most Time Consuming, Least Enjoyable Data Science Task, Survey Says.” A survey?

According to the capitalist tool:

A new survey of data scientists found that they spend most of their time massaging rather than mining or modeling data.

The point is that few wizards want to come to grips with the problem of figuring out what’s wrong with data in a set or a stream and then getting the data into a form that can be used with reasonable confidence.

Those exception folders, annoying, aren’t they?

The write up points that a data scientist spends 80 percent of his or her time doing housecleaning. Skip the job and the house becomes unpleasant indeed.

The survey also reveals that data scientists have to organize the data to be analyzed. Imagine that. The baloney about automatically sucking in a wide range of data does not match the reality of the survey sample.

Another grim bit of drudgery emerges from the sample which we assume was conducted with the appropriate textbook procedures was that the skills most in demand were for SQL. Yep, old school.

Consider that most of the companies marketing next generation data mining and analytics systems never discuss grunt work and old fashioned data management.

Why the disconnect?

My hunch is that it is the sizzle, not the steak, which sells. Little wonder that some analytics outputs might be lab-made hamburger.

Stephen E Arnold, April 1, 2016

Predictive Analytics on a Budget

March 30, 2016

Here is a helpful list from Street Fight that could help small and mid-sized businesses find a data analysis platform that is right for them—“5 Self-Service Predictive Analytics Platforms.”  Writer Stephanie Miles notes that, with nearly a quarter of small and mid-sized organizations reporting plans to adopt predictive analytics, vendors are rolling out platforms for companies with smaller pockets than those of multinational corporations. She writes:

“A 2015 survey by Dresner Advisory Services found that predictive analytics is still in the early stages of deployment, with just 27% of organizations currently using these techniques. In a separate survey by IDG Enterprise, 24% of small and mid-size organizations said they planned to invest in predictive analytics to gain more value from their data in the next 12 months. In an effort to encourage this growth and expand their base of users, vendors with business intelligence software are introducing more self-service platforms. Many of these platforms include predictive analytics capabilities that business owners can utilize to make smarter marketing and operations decisions. Here are five of the options available right now.”

Here are the five platforms listed in the write-up: Versium’s Datafinder; IBM’s Watson Analytics; Predixion, which can run within Excel; Canopy Labs; and Spotfire from TIBCO. See the article for Miles’ description of each of these options.


Cynthia Murrell, March 30, 2016

Sponsored by, publisher of the CyberOSINT monograph



Expert System Does a Me Too Innovation

March 29, 2016

Years ago I was a rental to an outfit called i2 Group in the UK. Please, don’t confuse the UK i2 with the ecommerce i2 which chugged along in the US of A.

The UK i2 had a product called Analysts Notebook. At one time it was basking in a 95 percent share of the law enforcement and intelligence market for augmented investigatory software. Analysts Notebook is still alive and kicking in the loving arms of IBM.

I thought of the vagaries of product naming when I read “Expert System USA Launches Analysts’ Workspace.”

According to the write up:

Analysts’ Workspace features comprehensive enterprise search and case management software integrated with a customizable semantic engine. It incorporates a sophisticated and efficient workflow process that enables team-wide collaboration and rapid information sharing. The product includes an intuitive dashboard allowing analysts to monitor, navigate, and access information using different taxonomies, maps, and worldviews, as well as intelligent workflow features specifically designed to proactively support analysts and investigators in the different phases of their activities.

The lingo reminds me of the early i2 Group marketing collateral. The terminology has surfaced in some of Palantir’s marketing statements and, quite recently, in the explanation of the venture funded Digital Shadows’ service.

I love me-too products. Where would one be if Mozart had not heard and remembered the note sequences of other composers.

Now the trick will be to make some money. Mozart, though a very good me too innovator, struggled in that department. Expert System, according to Google Finance, is going to have to find a way to keep that share price climbing. Today’s (March 22, 2016) share price is in penny stock territory:


Stephen E Arnold, March 29, 2016

Elasticsearch for Text Analysis

March 29, 2016

Short honk: Put your code hat on. “Mining Mailboxes with Elasticsearch and Kibana” walks a reader through using open source technology to do text analysis. The example under the microscope is email, but the method will work for any text corpus ingested by Elasticsearch. The write up includes code samples and enough explanation to get the Elastic system moving forward. Visualizations are included. These make it easy to spot certain trends; for example, the top recipients of the email analyzed for the tutorial. Worth a look.

Stephen E Arnold, March 29, 2016

Retraining the Librarian for the Future

March 28, 2016

The Internet is often described as the world’s biggest library containing all the world’s knowledge that someone dumped on the floor.  The Internet is the world’s biggest information database as well as the world’s biggest data mess.  In the olden days, librarians used to be the gateway to knowledge management but they need to vamp up their skills beyond the Dewey Decimal System and database searching.  Librarians need to do more and Christian Lauersen’s personal blog explains how in, “Data Scientist Training For Librarians-Re-Skilling Libraries For The Future.”

DST4L is a boot camp for librarians and other information professionals to learn new skills to maintain relevancy.  Last year DST4L was held as:

“DST4L has been held three times in The States and was to be set for the first time in Europe at Library of Technical University of Denmark just outside of Copenhagen. 40 participants from all across Europe were ready to get there hands dirty over three days marathon of relevant tools within data archiving, handling, sharing and analyzing. See the full program here and check the #DST4L hashtag at Twitter.”

Over the course of three days, the participants learned about OpenRefine, a spreadsheet-like application that cane be used for data cleanup and transformation.  They also learned about the benefits of GitHub and how to program using Python.  These skills are well beyond the classed they teach in library graduate programs, but it is a good sign that the profession is evolving even if the academia aspects lag behind.

Whitney Grace, March 28, 2016
Sponsored by, publisher of the CyberOSINT monograph


Hot Data Startups to Notice

March 22, 2016

An outfit called UBM, which looks a lot like the old IDC I knew and loved, published “9 Hot Big Data and Analyt5ics Startups to Watch.” The article is a series of separate pages. Apparently the lust for clicks is greater than the MBAs’ interest in making information easy to access. Progress in online publishing is zipping right along the information highway it seems.

What are the companies the article and UBM as describing as “hot.” I interpret the word to mean “having a high degree of heat or a high temperature” or “(of food) containing or consisting of pungent spices or peppers that produce a burning sensation when tasted.” I have a hunch the use of the word in this write up is intended to suggest big revenue producers which you must license in order to get or keep a job. Just a guess, mind you.

The companies are:

AtScale, founded in 2013

Algorithmia, founded in 2013

Bedrock Data, founded in 2012

BlueTalon, founded in 2013

Cazena, founded in 2014

Confluent, founded in 2014, founded in 2011

RJMetrics, founded in 2008

Wavefront, founded in 2013

The list is US centric. I assume none of the Big Data and analytics outfits in other countries are “hot.” I think the reason is that the research process looked at Boston, Seattle, and the Sillycon Valley pool and thought, “Close enough for horseshoes.” Just a guess, mind you.

If you are looking for the next big thing founded within the last two to eight years, the list is just what you need to make your company or organization great again. Sorry, some catchphrases are tough to purge from my addled goose brain. Enjoy the listicle. On high latency systems, the slides don’t render. Again. Do MBAs worry about this stuff? A final comment: I like the name “BlueTalon.”

Stephen E Arnold, March 22, 2016

Need a Classification Algorithm or 17?

March 21, 2016

I gave a lecture a couple of years ago about the similarity among major content processing systems. In that talk, I focused on 10 numerical recipes which our research identified in the commercial products from a number of well known intelligence platform vendors. The point of the lecture was to underscore the baked in weaknesses of platforms which use procedures taught in many universities. Outputs often vary because of the goofy decisions humans make or because the underlying data pumped into the numerical recipes is flawed.

I want to call your attention to “Implementation of 17 Classification Algorithms in R.” If you want to see the differences classification algorithms output, just fire up your system, implement these 17 methods, and check out the results. Our research reiterated to my goslings that one can select a classification algorithm to produce the type of output desired by the system engineer. Yep, put your hands on the steering wheel and drive that output pretty much where you want it to go. Do users of content processing systems know about these baked in pre-loaded destinations? Nah.

Stephen E Arnold, March 21, 2016

Google Decides to Be Nice to

March 18, 2016

Google is a renowned company for its technological endeavors, beautiful office campuses, smart employees, and how it is a company full of self-absorbed and competitive people.  While Google might have a lot of perks, it also has its dark side.  According to Quartz, Google wanted to build a more productive team so they launched Project Aristotle to analyze how and they found, “After Years Of Intensive Analysis, Google Discovers The Key To Good Teamwork Is being Nice.”

Project Aristotle studied hundreds of employees in different departments and analyzed their data.  They wanted to find a “magic formula,” but it all beats down to one of the things taught in kindergarten: be nice.

“Google’s data-driven approach ended up highlighting what leaders in the business world have known for a while; the best teams respect one another’s emotions and are mindful that all members should contribute to the conversation equally. It has less to do with who is in a team, and more with how a team’s members interact with one another.”

Team members who understand, respect, and allow each other to contribute to conversation equally.  It is a basic human tenant and even one of the better ways to manage a relationship, according to marriage therapists around the world.  Another result of the project is dubbed “psychological safety,” where team members create an environment with the established belief they can take risks and share ideas without ridicule.

Will psychological safety be a new buzzword since Google has “discovered” that being nice works so well?  The term has been around for a while, at least since 1999.

Google’s research yields a business practice that other companies have adopted: Costco, Trader Joes, Pixar, Sassie, and others to name a few.  Yet why is it so hard to be nice?


Whitney Grace, March 18, 2016
Sponsored by, publisher of the CyberOSINT monograph

Gartner and the Business Intelligence Magic Quadrant: Lots of Explaining, Lots of Subjectivity It Seems

March 13, 2016

I read a downright weird article/interview called “Big Data Discovery may put Oracle back in BI Magic Quadrant.” The title contains the magic word “may”, which does not promise to make Oracle a big dot in a Gartner Magic Quadrant, but it suggests that Gartner is doing some explaining.

As I understand the situation, the mid tier consulting firm analyzed the business intelligence sector and figured out which companies were winners and losers. Well, that’s the lingo that the original Boston Consulting Group quadrant used, and that’s how General Eisenhower used his quadrant. So those approaches override the Garnter words like niche players and visionaries. (Is it not possible for a niche player to be a visionary? Does Gartner know “Venn” to check it logic?)

The point of the write up is that Oracle, one of the big dogs in the Department of Defense’s DCGS-A and DCGS-N mash up analytics initiative is not in the Garnter magic square thing. Nope. Deleted.

Why may be a question which some folks at Oracle have been asking. The article/interview appears to be an “explainer” to make the Garnter mid tier method appear more near the top drawer in the cabinet of analytics collectibles.

I noted this passage:

Question: It sounds like the change isn’t coming from something Oracle did, but from Gartner.

Gartner’s R&D Big Dog, Josh Parenteau: Right, OBIEE is still there. It’s still being sold as their platform, but it does not meet the modern definition of the Magic Quadrant right now.

The acronym OBIEE means Oracle analytics. You, gentle reader, knew that.

Oracle was excluded because “they didn’t fully participate,” says Parenteau. He adds:

I do think that they’re late to the game by quite a bit… For Oracle, it’s recognizing the signals a bit earlier. It’s responding to customer needs and, I think, realizing that it’s not just about product. You can have the best product in the world, but if customers don’t want to work with you because they don’t like the relationship, it’s not going to matter.

So what companies of note made the Magic Quadrant? Since I don’t pay Gartner to advise me, I checked Bing and Google to locate the 2016 Magic Quadrant for Business Intelligence. It did not take long, because this MQ report appears to be a marketing item, not a confidential study like a report about the AVATAR program.

Check out these outfits who have met the Gartner criteria, objective and subjective:

  • BeyondCore
  • Domo
  • Logi Analytics
  • Platfora
  • Sisense

Okay, some names of note.

These outfits made the list as well:

  • IBM
  • Microsoft
  • SAS.

I highlighted this paragraph as particularly suggestive:

But I would say that, if you are a member of the install base of Oracle, know that they do have offerings in the space. They just didn’t have enough traction to get on the quadrant. If you have a big data Hadoop initiative going on, of course look at Big Data Discovery, because that’s exactly what it’s focused on. If you are looking for a tool to do data discovery, of course look at Visual Analyzer, which is part of the cloud service. If you have an initiative to get into the cloud, look at BICS. I wouldn’t say that, just because they’re not on the Magic Quadrant, if you’re an existing Oracle customer that you shouldn’t continue to look at them for solutions. This doesn’t mean that they are gone forever or off the MQ forever. It’s a transition. We’re in a market that is transitioning. Next year, it may be a new ball game.

Very mid tier. I liked the “you shouldn’t continue to look at them for solutions.” Are those words a positive or a negative? Worth watching the interaction of the Oracle folks at the Gartner experts.

Stephen E Arnold, March 13, 2016

« Previous PageNext Page »