CyberOSINT banner

Crazy Numbers Department: Big Data Spending in 2019

November 26, 2015

It is almost 2016. IDC, an outfit owned by an optimistic outfit, has taken a tiny step forward. The IDC wizards answered this question, “How big will Big Data spending be in 2019?” Yep, that is 36 months in the future. There might be more money in predicting Super Bowl winners, what stock to pick, and the steps to take to minimize risk at a restaurant. But no.

According to the true believers in the Content Loop, “IDC Days Big Data Spending to Hit 48.6 Billion in 2019.” I like that point six, which seems to suggest that real data were analyzed exhaustively.

The write up reports:

The market for big data technology and services will grow at a compound annual growth rate (CAGR) of 23 percent through 2019, according to a forecast issued by research firm International Data Corp. (IDC) on Monday. IDC predicts annual spending will reach $48.6 billion in 2019. IDC divides the big data market into three major submarkets: infrastructure, software and services. The research firm expects all three submarkets to grow over the next five years, with software — information management, discovery and analytics and applications software — leading the charge with a CAGR of 26 percent.

I will go out on a limb. I predict that IDC will offer for sale three reports, maybe more. I hope the company communicates with its researchers to avoid the mess created when IDC wizard Dave Schubmehl tried to pitch eight pages of wonderfulness based on my research for a mere $3,500 without my permission. Ooops. Those IDC folks are too busy to do the contract thing I assumed.

A Schubmehl-type IDC wizard offered this observation with only a soupçon of jargon:

The ever-increasing appetite of businesses to embrace emerging big data-related software and infrastructure technologies while keeping the implementation costs low has led to the creation of a rich ecosystem of new and incumbent suppliers…. At the same time, the market opportunity is spurring new investments and M&A activity as incumbent suppliers seek to maintain their relevance by developing comprehensive solutions and new go-to-market paths.– Ashish Nadkarni, program director, Enterprise Servers and Storage, IDC

Yes, ever increasing and go to spirit. Will the concept apply to IDC’s revenues? Those thrilled with the Big Numbers are the venture folks pumping money into Big Data companies with the type of enthusiastic good cheer as Russian ground support troops are sending with the Backfires, Bears,  and Blackjacks bound for Syria.

Thinking about international tension, my hunch is that the global economy seems a bit dicey, maybe unstable, at this time. I am not too excited at the notion of predicting what will happen in all things digital in the next few days. Years. No way, gentle reader.

Thinking about three years in the future strikes me as a little too bold. I wonder if the IDC predictive methods have been applied to DraftKings and FanDuel games?

Stephen E Arnold, November 26, 2015

Interview with Informatica CEO

November 26, 2015

Blogger and Datameer CEO Stefan Groschupf interviews Anil Chakravarthy, acting CEO of Informatica, in a series of posts on his blog, Big Data & Brews. The two executives discuss security in the cloud, data infrastructure, schemas, and the future of data. There are four installments as of this writing, but it was an exchange in the second iteration, “Big Data  Brews: Part II on Data Security with Informatica,” that  captured our attention. Here’s Chakravarthy’s summary of the challenge now facing his company:

Stefan: From your perspective, where’s the biggest growth opportunity for your company?

Anil: We look at it as the intersection of what’s happening with the cloud and big data. Not only the movement of data between our premise and cloud and within cloud to cloud but also just the sheer growth of data in the cloud. This is a big opportunity. And if you look at the big data world, I think a lot of what happens in the big data world from our perspective, the value, especially for enterprise customers, the value of big data comes from when they can derive insights by combining data that they have from their own systems, etc., with either third-party data, customer-generated data, machine data that they can put together. So, that intersection is good for, and we are a data infrastructure provider, so those are the two big areas where we see opportunity.

It looks like Informatica is poised to make the most of the changes prompted by cloud technology. To check out the interview from the beginning, navigate to the first installment, “Big Data & Brews: Informatica Talks Security.”

Informatica offers a range of data-management and integration tools. Though the company has offices around the world, they maintain their headquarters in Redwood City, California. They are also hiring as of this writing.

Cynthia Murrell, November 26, 2015

Sponsored by, publisher of the CyberOSINT monograph


No Mole, Just Data

November 23, 2015

It all comes down to putting together the pieces, we learn from Salon’s article, “How to Explain the KGB’s Aazing Success Identifying CIA Agents in the Field?” For years, the CIA was convinced there was a Soviet mole in their midst; how else to explain the uncanny knack of the 20th Century’s KGB to identify CIA agents? Now we know it was due to the brilliance of one data-savvy KGB agent, Yuri Totrov, who analyzed U.S. government’s personnel data to separate the spies from the rest of our workers overseas. The technique was very effective, and all without the benefit of today’s analytics engines.

Totrov began by searching the KGB’s own data, and that of allies like Cuba, for patterns in known CIA agent postings. He also gleaned a lot if info from  publicly available U.S. literature and from local police. Totrov was able to derive 26 “unchanging indicators” that would pinpoint a CIA agent, as well as many other markers less universal but useful. Things like CIA agents driving the same car and renting the same apartment as their immediate predecessors. Apparently, logistics agents back at Langley did not foresee that such consistency, though cost-effective, could be used against us.

Reporter Jonathan Haslam elaborates:

“Thus one productive line of inquiry quickly yielded evidence: the differences in the way agency officers undercover as diplomats were treated from genuine foreign service officers (FSOs). The pay scale at entry was much higher for a CIA officer; after three to four years abroad a genuine FSO could return home, whereas an agency employee could not; real FSOs had to be recruited between the ages of 21 and 31, whereas this did not apply to an agency officer; only real FSOs had to attend the Institute of Foreign Service for three months before entering the service; naturalized Americans could not become FSOs for at least nine years but they could become agency employees; when agency officers returned home, they did not normally appear in State Department listings; should they appear they were classified as research and planning, research and intelligence, consular or chancery for security affairs; unlike FSOs, agency officers could change their place of work for no apparent reason; their published biographies contained obvious gaps; agency officers could be relocated within the country to which they were posted, FSOs were not; agency officers usually had more than one working foreign language; their cover was usually as a ‘political’ or ‘consular’ official (often vice-consul); internal embassy reorganizations usually left agency personnel untouched, whether their rank, their office space or their telephones; their offices were located in restricted zones within the embassy; they would appear on the streets during the working day using public telephone boxes; they would arrange meetings for the evening, out of town, usually around 7.30 p.m. or 8.00 p.m.; and whereas FSOs had to observe strict rules about attending dinner, agency officers could come and go as they pleased.”

In the era of Big Data, it seems like common sense to expect such deviations to be noticed and correlated, but it was not always so obvious. Nevertheless, Totrov’s methods did cause embarrassment for the agency when they were revealed. Surely, the CIA has changed their logistic ways dramatically since then to avoid such discernable patterns. Right?

Cynthia Murrell, November 23, 2015

Sponsored by, publisher of the CyberOSINT monograph


Data and Information: Three Cs for an Average Grade

November 21, 2015

I read “Why Companies Are Not Engaging with Their Data.” The write up boils down the “challenge” to three Cs; that is, a mnemonic which makes it easy to pinpoint Big Data clumsiness.

The three Cs are:

  • Callowness
  • Cost
  • Complexity.

How does one get past the notion of inexperience? I suppose one muddles through grade school, high school, college, and maybe graduate school. Then one uses “experience” to get a job and one can repeat this process with Big Data. How many organizations will have an appetite for the organic approach to inexperience? Not many I assert. We live in a quick fix, do it now environment which darned well better deliver an immediate pay off or “value.” Big Data may require experience but the real world wants instant gratification.

Cost remains a bit of a challenge, particularly when revenues are under pressure. Data analytics can be expensive when done correctly and really costly if done incorrectly.

Complexity. Math remains math. Engineering data management systems tickles the fancy of problem solvers. Combine the two, and the senior management of many firms are essentially clueless about what is required to deliver outputs which are on the money and with budgets.

The write up states:

As a recent report from Ernst & Young points out ‘Most organizations have complex and fragmented architecture landscapes that make the cohesive collation and dissemination of data difficult.

In short, big hat, no cattle. Just like the promises of enterprise search vendor to make information accessible to those making business decisions, the verbal picture painted by marketers is more enticing than the shadow cast by Big Data’s Cs. I see that.

Stephen E Arnold, November 21, 2015

Predictions for a Big Data Future

November 19, 2015

Want to know what the future will look like? Navigate to “7 Reasons Why the Algorithmic Business Will Change Society.” The changes come via Datafloq via a mid tier consulting firm. I find the predictions oddly out of step with the milieu in which I live. That’s okay but this list of seven changes raises a number of questions and seems to sidestep some of the social consequences of the world foreshadowed in the predictions. Finding information is, let me say at the outset, not part of the Big Data future.

Here are the seven predictions:

    1. By 2018, 20% of all business content will be authorized by machines, which means a hiring freeze on copywriters in favor of robowriting algorithms;
    2. By 2020, autonomous software agents, or algorithms, outside human control, will participate in 5% off all economic transactions, thanks to, among others, blockchain. On the other hand, we will need pattern-matching algorithms to detect robot thieves. 
    3. By 2018, more than 3 million workers globally will be supervised by a “roboboss”. These algorithms will determine what work you would need to do.
    4. By 2018, 50% of the fastest growing companies will have fewer employees than smart machines. Companies will become smaller due to expanding presence of algorithms.
    5. By 2018, customer digital assistants will recognize individuals by face and voice across channels and partners. Although this will benefit the customer, organizations should prevent the creepiness-factor.
    6. By 2018, 2 millions employees will be required to wear health and fitness tracking devices. The data generated from these devices, will be monitored by algorithms, which will inform management on any actions to be taken.
    7. By 2020, smart agents will facilitate 40% of mobile transactions, and the post-app era will begin to dominate, where algorithms in the cloud guide us through our daily tasks without the need for individual apps.

Fascinating. Who will work? What will people do in a Big Data world? What about social issues? How will one find information? What happens if one or more algorithms drift and deliver flawed outputs?

No answers of course, but that’s the great advantage of talking about a digital future three or more years down the road. I assume folks will have time to plan their Big Data strategy for this predicted world. I suppose one could ask Google, Watson, or one’s roboboss.

Stephen E Arnold, November 19, 2015

A Modest Dust Up between Big Data and Text Analytics

November 18, 2015

I wonder if you will become involved in this modest dust up between the Big Data folks and the text analytics adherents. I know that I will sit on the sidelines and watch the battle unfold. I may mostly alone on that fence for three reasons:

  • Some text analytics outfits are Big Data oriented. I would point modestly to Terbium Labs and Recorded Future. Both do the analytics thing and both use “text” in their processing. (I know that learning about these companies is not as much fun as reading about Facebook friends, but it is useful to keep up with cutting edge outfits in my opinion.)
  • Text analytics can produce Big Data. I know that sounds like a fish turned inside out. Trust me. It happens. Think about some wan government worker in the UK grinding through Twitter and Facebook posts. The text analytics output lots of data.
  • A faux dust up is mostly a marketing play. I enjoyed search and content processing vendor presentations which pitted features of one system versus another. This approach is not too popular because every system says it can do what every other system can do. The reality of the systems is, in most cases, not discernible to the casual failed webmaster now working as a “real” wizard.

Navigate to “Text Analytics Gurus Debunk 4 Big Data Myths.” You will learn that there are four myths which are debunked. Here are the myths:

  1. Big Data survey scores reign supreme. Hey, surveys are okay because outfits like Survey  Monkey and the crazy pop up technology from that outfit in Michigan are easy to implement. Correct? Not important. Usable data for marketing? Important.
  2. Bigger social media data analysis is better. The outfits able to process the real time streams from Facebook and Twitter have lots of resources. Most companies do not have these resources. Ergo: Statistics 101 reigns no matter what the marketers say.
  3. New data sources are the most valuable. The idea is that data which are valid, normalized, and available for processing trump bigness. No argument from me.
  4. Keep your eye on the ball by focusing on how customers view you. Right. The customer is king in marketing land. In reality, the customer is a code word for generating revenue. Neither Big Data nor text analytics produce enough revenue in my world view. Sounds great though.

Will Big Data respond to this slap down? Will text analytic gurus mount their steeds and take another run down Marketing Lane to the windmill set up as a tourist attraction in an Amsterdam suburb?

Nope. The real battle involves organic, sustainable revenues. Talk is easy. Closing deals is hard. This dust up is not a mixed martial arts pay per view show.

Stephen E Arnold, November 18, 2015

Deleting Data: Are They Really Gone?

November 17, 2015

I read “Gawker Media’s Data Guru Presents the Case for Deleting Data.” The main idea is that hoarding permits a reality TV program. Hoarding data may not be good TV.

The write up points out that data cleaning is not cheap. Storage also costs money.

A Gawker wizard is quoted as saying:

We effectively are setting traps in our data sets for our future selves and our colleagues… Increasingly, I find that eliminating this data from our databases is the best solution. Gawker’s traffic data is maintained for just a few months. In our own logs and databases, we only have traffic data since February. and even that’s of limited use: We’ll toss some of it before the end of the year.

Seems reasonable. However, there may be instances when dumping or just carelessly overwriting log files might not be expedient or legal. For example, in one government agency, the secretary’s “bonus” depends on showing how Internet site usage relates to paperwork reduction. The idea is that when a “customer” of the government uses a Web site and does not show up in person at an office to fill out a request, the “customer” allegedly gets better service and costs, in theory, should drop. Also, some deals require that data be retained. You can use your imagination if you are an ISP in a country recently attacked by terrorists and your usage logs are “disappeared.” SEC and IRS retention guidelines? Worth noting in some cases.

The question is, “Are data really gone once deleted?” The fact of automatic backups, services in the middle routinely copying data, and other ways of creating unobserved backups may mean that deleted data can come back to life.

Pragmatism and legal constraints as well as the “men in the middle” issue can create zombie data, which, unlike the fictional zombies, can bite.

Stephen E Arnold, November 17, 2015

Quote to Note: Big Data Must Be Small

November 16, 2015

The consulting firm KPMG Chine tweeted a quote I found worthy of my Quote to Note folder. You may be able to read this gem in this tweet, at least for now.

Here’s the quote attributed to Dr. Mark Kennedy, whom I presume is either a KPMG expert or an advisor to the blue chip firm:

To get value from Big Data, make it small.

That quote seems to complement the definition in “Big Data Explained in Less Than 2 Minutes to Absolutely Anyone”; to wit:

The behind the phrase ‘Big Data’ is that everything we do is increasingly leaving a digital trace (or data), which we (and others) can use and analyze. Big Data therefore refers to that data being collected and our ability to make use of it.

Does this mean that Big Data are just data with spray on marketing silicone? Definitions of big and small might be helpful. The fish I caught last summer was this big.

Stephen E Arnold, November 16, 2015

More Big Data Value Floundering

November 15, 2015

Here in Harrod’s Creek, Kentucky, the mist is rising from the mine drainage ditch. Value is calculated in a couple of easy ways. Here are two concrete examples:

One of my neighbors buys my collection of used auto parts. Before he puts the parts in his truck, a 1950 Chevrolet, he pays me cash money. Once I count the money, I help him load the parts and watch him drive away in a haze of Volkswagen type emissions.

Here’s another:

A person calls me and wants to talk with me about enterprise search and content processing. I explain that I don’t “talk” for free. If the caller transfers cash money to my PayPal account, then I call the person and answer questions. The time buys minutes. When the minutes are consumed, I hang up.

The notion of value, therefore, is focused on cash, not feeling good, having a nice day, or winning an election as the friendliest retired consultant in Harrod’s Creek.

Now navigate to “What Is the Value of Big Data to Your Business?” There is a gap between my definition of value and the definition of value set forth in this write up.

Here’s an example of Big Data value:

Big data and how it shapes your company

Big data is at the center of many decisions in any company. It will allow your company to:

Reduce and manage risk

Without data, organizations are vulnerable to many risks. Big data allows financial institutions to profile their customers when giving them credit facilities. Insurance companies can also create risk profiles which will allow them to set appropriate premiums for different customers. Agricultural enterprises as well, can use data on weather and food pricing to control production.

Better decision making

Collecting data on employees’ interests, behavior, interactions, work time, resource use and resource allocation can be very instrumental in creating better structures, improving the flow of information, increasing inter-departmental cooperation, increasing efficiency, saving time and saving resources.

Get a competitive edge

Monitoring competitor products, marketing activities, sales and pricing will help you to respond urgently with your own counter measures. If you are selling your products on a platform like Amazon, you can keep an eye on your biggest competitors and respond accordingly when they seem to be outselling you.

News flash. None of these listicle items deliver value from my point of view. Like other buzzwords and whizzy concepts, backfilling with generalizations is not going to convince me that Big Data has “value” unless the situation is linked to cash money.

Call me old fashioned, but this approach to value is one reason many companies are struggling to generate revenue from their search and content processing efforts.

Stephen E Arnold, November 15, 2015

Crazy, Wild Hadoop Prioritization Advice

November 12, 2015

I read “Top 10 Priorities for a Successful Hadoop Implementation.” A listicle. I understand. Clicks. Visibility. Fame. Fortune. Well, hopefully.

I wanted to highlight two pieces of advice delivered in a somber, parental manner. Here are two highlights from the write up intended to help a Hadoop administrator get ‘er done and keep the paychecks rolling in.

Item 2 of 10: “Innovate with Big Data on enterprise Hadoop.” I find it amusing when advisors, poobahs, and former middle school teachers tell another person to innovate. Yep, that works really well. Even those who innovate are faced with failure many times. I think the well ran dry for some of the Italian Renaissance artists when the examples of frescos in Nero’s modest home were recycled. Been there. Done that. The notion of a person innovating with an enterprise deployment of Hadoop strikes me as interesting, but probably not a top 10 priority. How about getting the data into the system, formulating a meaningful query, and figuring out how to deal with the batchiness of the system?

Item 9 of 10: “Look for capabilities that make Hadoop data look relational.” There is a reason to use Codd type data management systems. Those reasons include that they work when properly set up, and they require data which can be sliced and diced. Maybe not easily, but no one fools himself or herself thinking, “Gee, why don’t I dump everything into one big data lake and pull out the big, glossy fish automagically.”

I am okay with advice. Perhaps it should reflect the reality with which open source data management tools present to an enterprise user seeking guidance. Enterprise search vendors got themselves into a world of hurt with this type of casual advice. Where are those vendors now?

Stephen E Arnold, November 12, 2015

Next Page »