What Lurks in the Dark Web?

October 20, 2016

Organizations concerned about cyber security can effectively thwart any threats conditionally they know a threat is lurking in the dark. An Israeli SaaS-based startup claims it can bridge this gap by offering real-time analysis of data on Dark Web.

TechCrunch in an article Sixgill claims to crawl the Dark Web to detect future cybercrime says:

Sixgill has developed proprietary algorithms and tech to connect the Dark Web’s dots by analyzing so-called “big data” to create profiles and patterns of Dark Web users and their hidden social networks. It’s via the automatic crunching of this data that the company claims to be able to identify and track potential hackers who may be planning malicious and illegal activity.

By analyzing the data, Sixgill claims that it can identify illegal marketplaces, data leaks and also physical attacks on organizations using its proprietary algorithms. However, there are multiple loopholes in this type of setup.

First, some Dark Web actors can easily insert red herrings across the communication channels to divert attention from real threats. Second, the Dark Web was created by individuals who wished to keep their communications cloaked. Mining data, crunching it through algorithms would not be sufficient enough to keep organizations safe. Moreover, AI can only process data that has been mined by algorithms, which is many cases can be false. TOR is undergoing changes to increase the safeguards in place for its users. What’s beginning is a Dark Web arms race. A pattern of compromise will be followed by hardening. Then compromise will occur and the Hegelian cycle repeats.

Vishal Ingole, October 20, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Big Data and Visualization: The Ham and Eggs of Analysis

October 14, 2016

i read “Big Data Is Useless without Visual Analytics.” (Nope, I won’t comment on the fact that “data” is a plural.) The main point of the article is that looking at a table is not as easy as looking at a graphic, preferably Hollywood style, presentation ready visualizations. If you want to see a nifty visualization, check out the Dark Trace three dimensional, real time visualizations.

The write up informed me:

Visualizations are valuable because they display a lot of data in an easy-to-understand visual format that works well for our visually-oriented minds.

Okay. A lot. Good.

I learned that “data mining is too complicated for most uses of Big Data.”

No kidding. Understanding and making justifiable decisions about data validity, math with funny symbols, and numerical recipes which make the numbers conform to the silliness taught in university statistics classes. These are difficult tasks for avid Facebook users and YouTube content consumers to deal with.

I understand. Some folks are just really busy.

The write up explains that Excel is not the proper tool for real information analysis. Never mind that Excel remains a reasonably popular chunk of software. Some Excel lovers write memos and reports in Excel. That’s software love.

So what’s the fix?

Surprisingly the write up does not provide one. But there is a link which allows me to download a report from my pals at IDC. You remember IDC, right? That is the outfit which tried to sell my content on Amazon without my permission and without having a agreement with me to publish my research. If you have forgotten what I call the “Schubmehl play”, you can get some of the details at this link.

Nice write up. Too bad it lacks useful content on the subject presented in the headline. But what else does one expect these days?

Stephen E Arnold, October 14, 2016

Mid Tier Consulting Firm: Big Data Fear

October 12, 2016

I love it when mid tier consulting firms become contrarians. Navigate to “Gartner Warns Big Data’s Bubble May Burst As Enterprises Cut Investment.” The write up informed me:

Gartner has found. In its latest survey of 199 technology executives, the analyst firm found that many companies have struggled to obtain insights that make a real difference to their bottom line.

I don’t want to get into the Statistics 101 lecture about sampling, but keep in mind that there may be some wobbles in who was asked and who answered the “survey.”

Let’s assume that the mid-tier outfit did a pretty good job with its 199 objective respondents. I learned that:

the number of companies that are planning to invest in Big Data in the next two years has fallen by 6 percent, from 31 percent in 2015 to just 25 percent this year. Another telling statistic is that while roughly three-quarters of companies have invested, or are planning to invest, in Big Data, the overwhelming majority of those firms remain stuck at the pilot stage.

The write up points to another mid tier outfit’s research which suggests that Big Data may not be the home run that some pundits assert. Is Big Data doomed? Nah, a third mid tier outfit predicts that the Big Data market will grow “three times as fast as the overall information technology market.”

Whew. For a moment I thought the sky was falling.

Several observations:

  • Fear sells.
  • Uncertainty sells.
  • Seemingly authoritative research sells.

What’s the common factor? The mid tier outfits are working overtime to generate interest in their services. Perhaps the economy is putting some pressure on the mid tier folks. Go with fear.

Even snakes flee from earth tremors. There may be Big Data to quantify that fear as long as one can point and click, not think about data integrity, and have to do math. I love it. 199.

Stephen E Arnold, October 12, 2016

Pharmaceutical Research Made Simple

October 3, 2016

Pharmaceutical companies are a major power in the United States.  Their power comes from the medicine they produce and the wealth they generate.  In order to maintain both wealth and power, pharmaceutical companies conduct a lot of market research.  Market research is a field based on people’s opinions and their reactions, in other words, it contains information that is hard to process into black and white data.  Lexalytics is a big data platform built with a sentiment analysis to turn market research into useable data.

Inside Big Data explains how “Lexalytics Radically Simplifies Market Research And Voice Of Customer Programs For The Pharmaceutical Industry” with a new package called the Pharmaceutical Industry Pack.  Lexalytics uses a combination of machine learning and natural language processing to understand the meaning and sentiment in text documents.  The new pack can help pharmaceutical companies interpret how their customers react medications, what their symptoms are, and possible side effects of medication.

Our customers in the pharmaceutical industry have told us that they’re inundated with unstructured data from social conversations, news media, surveys and other text, and are looking for a way to make sense of it all and act on it,’ said Jeff Catlin, CEO of Lexalytics. ‘With the Pharmaceutical Industry Pack — the latest in our series of industry-specific text analytics packages — we’re excited to dramatically simplify the jobs of CEM and VOC pros, market researchers and social marketers in this field.

Along with basic natural language processing features, the Lexalytics Pharmaceutical Industry Pack contains 7000 sentiment terms from healthcare content as well as other medical references to understand market research data.  Lexalytics makes market research easy and offers invaluable insights that would otherwise go unnoticed.

Whitney Grace, October 3, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Big Data Superficialities

October 2, 2016

When one needs to understand Big Data, what’s the go to resource? A listicle of Big Data generalities. Navigate to “6 Illusions Execs Have About Big Data.” The article points out that Big Data is a buzzword. Shocker. And the chimera identified? Here you go:

  • All data is Big Data. Yep, a categorical affirmative. Love those “all’s”.
  • Big Data solves every problem. Another categorical affirmatives. Whether it is the Zucks’s curing “all” disease or Big Data dealing with “every” problem, the generalization is rock solid silliness.
  • Big Data is meaningless. The statement leads to this parental observation: “To make big data less meaningless, you need to be able to process and use it.” I am curious about the cost, method, and accuracy of the outputs in the real world.
  • Big Data is easy. The enumeration of the attributes of a pair of women’s shoes resonates with me. I like flats.
  • Imperfect Big Data is useless. Nope, Imperfect.Many data sets have imperfections. The hard work is normalizing and cleaning the information.
  • Only big companies need big data. I like the balanced sentence structure and repetition. The reality, however, is that small outfits often struggle with little data. A data set can easily overwhelm the small outfit’s resources and act like a stuck parking brake when closing deals and generating revenue are Jobs 1 and 2.

Amazing stuff. When I encounter information similar to that contained in the source document, I understand how many vendors close deals, pocket dough, and leave the lucky buyers wondering what happened to their payoff from Big Data.

Stephen E Arnold, October 2, 2016

Brontobyes on the Move

September 21, 2016

I read “Basic Understanding of Big Data. What Is This and How It Is Going to Solve Complex Problems.” My initial reaction was that the article recycled Hewlett Packard Enterprise marketing exhaust. The author describes himself as:

a Software Geek, graduated  in Computer Science and Engineering, and currently working as a BigData Developer in one of the leading MultiNational Company. (sic)

For me, the main idea in the article is that there is a great deal of Big Data sloshing around the datasphere. For proof, the write up reproduces an image from Hewlett Packard, and outfit now focused on infrastructure and consulting. The software has been spun off into an arm’s length outfit to the applause of lawyers, investment bankers, and accountants.

It appears that HPE has calculated that every 60 second these digital emissions take place:

  1. More than 98,000 tweets
  2. 595,000 Facebook status updates
  3. 11 million instant messages
  4. 698,445 Google search
  5. More than 168 million emails sent
  6. 1,820 terabytes of data created (no word on whether this is double counting of the tweets, instant messages, etc.)
  7. 217 new mobile Web users.

Nifty numbers but no footnote. My question? What are the sources of these data?

The article then trots out the Dracula lurking in the shadows—Big Data. Again Hewlett Packard Enterprise becomes the source of the fangs. A visual which looks like a PowerPoint slide says that we have “gone beyond the decimal systems.” Really? I particularly liked the introduction of the “brontobyte.” According to the art work, a brontobyte is a one followed by 27 zeros. Oh, I thought the decimal system was dead yet we are using it to explain this big numeric concept. Strikes me as goose feathers.

I was thrilled to see that the article then shifts gears to embrace IBM’s jargon about volume, variety, and velocity or the three Vs of Big Data. Data problems are really bad when the three Vs come into the game.

The article then introduces some technical concepts which have been kicking around since the late 1990s.

The author then tells me:

this article talks about only a glass of water from the entire ocean. Go get started and take a dip dive in the bigdata world or if i can say BigData Planet :)

Several observations:

  1. Recycling the marketing of a company like Hewlett Packard does not inspire confidence in the factual foundation of the article
  2. Dipping into IBM’s marketing jargon just makes me nervous. IBM has been lining up declining revenues for years
  3. Urging me to “take a dip” in the “bigdata world” (sic) is advice that is wonky.

What I liked about the write up was the inclusion of the report that “we have gone beyond the decimal system.” Sounds crazy to me. What if the brontobyte is carnivorous and hungry for Big Data expert stew.

Stephen E Arnold, September 21, 2016

A New Spin on Big Data and the Dark Web

September 12, 2016

While most of us are occupied with finding best Labor Day deals over the Internet, Chris White is developing technologies to save lives from the dark world of sex trade.

In the article  “The Man Who Lit the Dark Web ,” the author states that

An estimated 21 million people are being trafficked around the planet. More than half are women and girls. More than 1 million are children. Nearly one-quarter are bought and sold as sex slaves. Only 1-in-100 victims of human trafficking is ever rescued. It’s a booming business. High profits and low risk make human trafficking one of the fastest-growing and most lucrative crimes on the planet; the U.N. recently estimated that trafficking nets $150 billion a year.

With Dark Web, traffickers have realized that its easier for them to operate away from the eyes law enforcement. The article asserts:

The “surface” Web, or open Web, represents between 5 and 20 percent of what’s out there. The rest resides in places that most crawlers can’t reach or index. Some data are “deep,” in password-protected places like social media and message boards, or in increasingly common dynamic websites—which are more like apps than pages from a book, and change when you interact with them, like Kayak. The rest of the Web is “dark.”

White’s approach is to tackle the Dark Web with Big Data. The author of the article spent a decade of his life helping US Army track, penetrate and destroy financial networks of terrorist organizations. Will the Big Data approach actually work?

Certainly White, a Microsoft employee, is helping. Agencies like Defense Advanced Research Projects Agency (DARPA) and similar organizations may have to channel research funds into initiatives like White’s. Otherwise, the payoff from commercial innovations will put a lid on efforts like White’s.

Vishal Ingole, September 12, 2016

Big Data Processing Is Relative to Paradigm of Today

September 7, 2016

The size and volume that characterizes an information set as big data — and the tools used to process — is relative to the current era. A story from NPR reminds us of this as they ask, Can Web Search Predict Cancer? Promise And Worry Of Big Data And Health. In 1600’s England, a statistician essentially founded demography by compiling details of death records into tables. Today, trends from big data are drawn through a combination of assistance from computer technology and people’s analytical skills. Microsoft scientists conducted a study showing that Bing search queries may hold clues to a future diagnosis of pancreatic cancer.

The Microsoft scientists themselves acknowledge this [lack of comprehensive knowledge and predictive abilities] in the study. “Clinical trials are necessary to understand whether our learned model has practical utility, including in combination with other screening methods,” they write. Therein lies the crux of this big data future: It’s a logical progression for the modern hyper-connected world, but one that will continue to require the solid grounding of a traditional health professional, to steer data toward usefulness, to avoid unwarranted anxiety or even unnecessary testing, and to zero in on actual causes, not just correlations within particular health trends.”

As the producers of data points in many social-related data sets, and as the original analyzers of big data, it makes sense that people remain a key part of big data analytics. While this may be especially pertinent in matters related to health, it may be more intuitively understood in this sector in contrast to others. Whether health or another sector, can the human variable ever be taken out of the data equation? Perhaps such a world will give rise to whatever is beyond the current buzz around the phrase big data.

Megan Feil, September 7, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233599645/

Real Time Data Analysis for Almost Anyone

August 25, 2016

The idea that Everyman can tap into a real time data stream and perform “analyses” is like catnip for some. The concept appeals to those in the financial sector, but these folks often have money (yours and mine) to burn. The idea seems to snag the attention of some folks in the intelligence sector who want to “make sense” out of Twitter streams and similar flows of “social media.” In my experience, big outfits with a need to tap into data streams have motivation and resources. Most of those who fit into my pigeonhole have their own vendors, systems, and methods in place.

The question is, “Does Tom’s Trucking need to tap into real time data flows to make decisions about what paint to stock or what marketing pitch to use on the business card taped to the local grocery’s announcement board?”

I plucked from my almost real time Web information service (Overflight) two articles suggesting that there is money in “them thar hills” of data.

The first is “New Amazon Service Uses SQL To Query Streaming Big Data.” Amazon is a leader in the cloud space. The company may not be number one on the Gartner hit parade, but some of those with whom I converse believe that Amazon continues to be the cloud vendor to consider and maybe use. The digital Wal-Mart has demonstrated both revenue and innovation with its cloud business.

The article explains that Amazon has picked  up the threads of Hadoop, SQL, and assorted enabling technologies and woven Amazon Kinesis Analytics. The idea is that Amazon delivers a piping hot Big Data pizza via a SQL query. The write up quotes an Amazon wizard as saying:

“Being able to continuously query and gain insights from this information in real-time — as it arrives — can allow companies to respond more quickly to business and customer needs,” AWS said in a statement. “However, existing data processing and analytics solutions aren’t able to continuously process this ‘fast moving’ data, so customers have had to develop streaming data processing applications — which can take months to build and fine-tune — and invest in infrastructure to handle high-speed, high-volume data streams that might include tens of millions of events per hour.”

Additional details appear in Amazon’s blog post here. The idea is that anyone with some knowledge of things Amazon, coding expertise, and a Big Data stream can use the Amazon service.

The second write up is “Microsoft Power BI Dashboards Deliver Real Time Data.” The idea seems to be that Microsoft is in the real time data analysis poker game as well. The write up reveals:

Power BI’s real-time dashboards — known as Real-Time Dashboard tiles — builds on the earlier Power BI REST APIs release to create real-time tiles within minutes. The tiles push data to the Power BI REST APIs from streams of data created in PubNub, a real-time data streaming service currently used widely for building web, mobile and IoT applications.

The idea is that a person knows the Microsoft methods, codes the Microsoft way, and has a stream of Big Data. The user then examines the outputs via “tiles.” These are updated in real time. As mentioned above, Microsoft is the Big Data Big Dog in the Gartner kennel. Obviously Microsoft will be price competitive with the service prices at about $10 per month. The original price was about $40 a month, but the cost cutting fever is raging in Redmond.

The question is, “Which of these services will dominate?” Who knows? Amazon has a business and a real time pitch which makes sense to those who have come to depend on the AWS services. Microsoft has business customers, Windows 10, and a reseller/consulting community eager to generate revenue.

My thought is, “Pick your horse, put down your bet, and head to the Real Time Data Analytics race track.” Tomorrow’s $100 ticket is only a few bucks today. The race to low cost entry fees is about to begin.

Stephen E Arnold, August 25, 2016

HonkinNews for August 16, 2016

August 16, 2016

The weekly news program about search, online, and content processing is now available at https://youtu.be/mE3MGlmrUWc. In addition to comments about Goo!Hoo, IBM, and Microsoft, you will learn about grilling squirrel over a wood fire. Live from Harrod’s Creek.

Stephen E Arnold, August 16, 2016

Next Page »