February 5, 2016
Shiver me timbers. Batten the hatches. There is a storm brewing in the use of Autonomy-type methods to identify risks and fraud. To be fair, HP Enterprise no longer pitches Autonomy, but the sprit of Dr. Mike Lynch’s 1990s technology is there, just a hint maybe, but definitely noticeable to one who has embraced IDOL.
For the scoop, navigate to “HPE Launches Investigative Analytics, Using AI and Big Data to Identify Risk.” I was surprised that the story’s headline did not add “When Swimming in the Data Lake.” But the message is mostly clear despite the buzzwords.
Here’s a passage I highlighted:
The software is initially geared toward financial services organizations, and it combines existing HPE products like Digital Safe, IDOL, and Vertica all on one platform. By using big data analytics and artificial intelligence, it can analyze a large amount of data and help pinpoint potential risks of fraudulent behavior.
Note the IDOL thing.
The write up added:
Investigative Analytics starts by collecting both structured sources like trading systems, risk systems, pricing systems, directories, HR systems, and unstructured sources like email and chat. It then applies analysis to query “aggressively and intelligently across all those data sources,” Patrick [HP Enterprise wizard] said. Then, it creates a behavior model on top of that analysis to look at certain communication types and see if they can define a certain problematic behavior and map back to a particular historical event, so they can look out for that type of communication in the future.
This is okay, but the words, terminology, and phrasing remind me of more than 1990 Autonomy marketing collateral, BAE’s presentations after licensing Autonomy technology in the late 1990s, the i2 Ltd. Analyst Notebook collateral, and, more recently, the flood of jabber about Palantir’s Metropolitan Platform and Thomson Reuters’ version of Metropolitan called QA Direct or QA Studio or QA fill in the blank.
The fact that HP Enterprise is pitching this new service developed with “one bank” at a legal eagle tech conference is a bit like me offering to do my Dark Web Investigative Tools lecture at Norton Elementary School. A more appropriate audience might deliver more bang for each PowerPoint slide, might it not?
Will HP Enterprise put a dent in the vendors already pounding the carpeted halls of America’s financial institutions?
HP Enterprise stakeholders probably hope so. My hunch is that a me-too, me-too product is a less than inspiring use of the collection of acquired technologies HP Enterprise appears to put in a single basket.
Stephen E Arnold, February 5, 2016
February 5, 2016
Elasticsearch is one of the most popular open source search applications and it has been deployed for personal as well as corporate use. Elasticsearch is built on another popular open source application called Apache Lucene and it was designed for horizontal scalability, reliability, and easy usage. Elasticsearch has become such an invaluable piece of software that people do not realize just how useful it is. Eweek takes the opportunity to discuss the search application’s uses in “9 Ways Elasticsearch Helps Us, From Dawn To Dusk.”
“With more than 45 million downloads since 2012, the Elastic Stack, which includes Elasticsearch and other popular open-source tools like Logstash (data collection), Kibana (data visualization) and Beats (data shippers) makes it easy for developers to make massive amounts of structured, unstructured and time-series data available in real-time for search, logging, analytics and other use cases.”
How is Elasticsearch being used? The Guardian is daily used by its readers to interact with content, Microsoft Dynamics ERP and CRM use it to index and analyze social feeds, it powers Yelp, and her is a big one Wikimedia uses it to power the well-loved and used Wikipedia. We can already see how much Elasticsearch makes an impact on our daily lives without us being aware. Other companies that use Elasticsearch for our and their benefit are Hotels Tonight, Dell, Groupon, Quizlet, and Netflix.
Elasticsearch will continue to grow as an inexpensive alternative to proprietary software and the number of Web services/companies that use it will only continues to grow.
February 4, 2016
I read an article that I dismissed. The title nagged at my ageing mind and dwindling intellect. “This is Why Dictators Love Big Data” did not ring my search, content processing, or Dark Web chimes.
Annoyed at my inner voice, I returned to the story, annoyed with the “This Is Why” phrase in the headline.
Predictive analytics are not new. The packaging is better.
I think this is the main point of the write up, but I an never sure with online articles. The articles can be ads or sponsored content. The authors could be looking for another job. The doubts about information today plague me.
The circled passage is:
Governments and government agencies can easily use the information every one of us makes public every day for social engineering — and even the cleverest among us is not totally immune. Do you like cycling? Have children? A certain breed of dog? Volunteer for a particular cause? This information is public, and could be used to manipulate you into giving away more sensitive information.
The only hitch in the git along is that this is not just old news. The systems and methods for making decisions based on the munching of math in numerical recipes has been around for a while. Autonomy? A pioneer in the 1990s. Nope. Not even the super secret use of Bayesian, Markov, and related methods during World War II reaches back far enough. Nudge the ball to hundreds of years farther on the timeline. Not new in my opinion.
I also noted this comment:
In China, the government is rolling out a social credit score that aggregates not only a citizen’s financial worthiness, but also how patriotic he or she is, what they post on social media, and who they socialize with. If your “social credit” drops below a certain level because you post anti-government messages online or because you’re socially associated with other dissidents, you could be denied credit approval, financial opportunities, job promotions, and more.
Just China? I fear not, gentle reader. Once again the “real” journalists are taking an approach which does not do justice to the wide diffusion of certain mathy applications.
Net net: I should have skipped this write up. My initial judgment was correct. Not only is the headline annoying to me, the information is par for the Big Data course.
Stephen E Arnold, February 4, 2016
February 4, 2016
The marketplaces of the Dark Web provide an interesting case study in innovation. Three types of Dark Web fraud aimed at the hotel industry, for example, was recently published on Cybel Blog. Delving into the types of cybercrime related to the hospitality industry, the article, like many others recently, discusses the preference of cybercriminals in dealing with account login information as opposed to credit cards as detectability is less likely. Travel agencies on the Dark Web are one such way cybercrime as a service exists:
“Dark Web “travel agencies” constitute a third type of fraud affecting hotel chains. These “agencies” offer room reservations at unbeatable prices. The low prices are explained by the fact that the seller is using fraud and hacking. The purchaser contacts the seller, specifying the hotel in which he wants to book a room. The seller deals with making the reservation and charges the service to the purchaser, generally at a price ranging from a quarter to a half of the true price per night of the room. Many sellers boast of making bookings without using stolen payment cards (reputed to be easy for hotels to detect), preferring to use loyalty points from hacked client accounts.”
What will they come up with next? The business to consumer (B2C) sector includes more than hotels and presents a multitude of opportunities for cybertheft. Innovation must occur on the industry side as well in order to circumvent such hacks.
Megan Feil, February 4, 2016
February 3, 2016
An article entitled Tor and the enterprise 2016 – blocking malware, darknet use and rogue nodes from Computer World UK discusses the inevitable enterprise concerns related to anonymity networks. Tor, The Onion Router, has gained steam with mainstream internet users in the last five years. According to the article,
“It’s not hard to understand that Tor has plenty of perfectly legitimate uses (it is not our intention to stigmatise its use) but it also has plenty of troubling ones such as connecting to criminal sites on the ‘darknet’, as a channel for malware and as a way of bypassing network security. The anxiety for organisations is that it is impossible to tell which is which. Tor is not the only anonymity network designed with ultra-security in mind, The Invisible Internet Project (I2P) being another example. On top of this, VPNs and proxies also create similar risks although these are much easier to spot and block.”
The conclusion this article draws is that technology can only take the enterprise so far in mitigating risk. Reliance on penalties for running unauthorized applications is their suggestion, but this seems to be a short-sighted solution if popularity of anonymity networks rise.
Megan Feil, February 3, 2016
February 2, 2016
The infographic on the IBM Big Data & Analytics Hub titled Extracting Business Value From the 4 V’s of Big Data involves quantifying Volume (scale of data), Velocity (speed of data), Veracity (certainty of data), and Variety (diversity of data). In a time when big data may have been largely demystified, IBM makes an argument for its current relevance and import, not to mention its mystique, with reminders of the tremendous amounts of data being created and consumed on a daily basis. Ultimately the graphic is an ad for the IBM Analytics Technology Platform. The infographic also references a “fifth “V”,
“Big data = the ability to achieve greater Value through insights from superior analytics. Case Study: A US-based aircraft engine manufacturer now uses analytics to predict engine events that lead to costly airline disruptions, with 97% accuracy. If this prediction capability had ben available in the previous year, it would have saved $63 million.”
IBM struggles for revenue. But, obviously from this infographic, IBM knows how to create Value with a capital “V”, if not revenue. The IBM Analytics Technology Platform promises speedier insights and actionable information from trustworthy sources. The infographic reminds us that poor quality in data leads to sad executives, and that data is growing exponentially, with 90% of all data forged in only the last two years.
Chelsea Kerwin, February 2, 2016
February 2, 2016
A friend recently told me how they can go months avoiding suspicious emails, spyware, and Web sites on her computer, but the moment she hands her laptop over to her father he downloads a virus within an hour. Despite the technology gap existing between generations, the story goes to show how easy it is to deceive and steal information these days. ExpertClick thinks that metadata might hold the future means for cyber security in “What Metadata And Data Analytics Mean For Data Security-And Beyond.”
The article uses biological analogy to explain metadata’s importance: “One of my favorite analogies is that of data as proteins or molecules, coursing through the corporate body and sustaining its interrelated functions. This analogy has a special relevance to the topic of using metadata to detect data leakage and minimize information risk — but more about that in a minute.”
This plays into new companies like, Ayasdi, using data to reveal new correlations using different methods than the standard statistical ones. The article compares this to getting to the data atomic level, where data scientists will be able to separate data into different elements and increase the analysis complexity.
“The truly exciting news is that this concept is ripe for being developed to enable an even deeper type of data analytics. By taking the ‘Shape of Data’ concept and applying to a single character of data, and then capturing that shape as metadata, one could gain the ability to analyze data at an atomic level, revealing a new and unexplored frontier. Doing so could bring advanced predictive analytics to cyber security, data valuation, and counter- and anti-terrorism efforts — but I see this area of data analytics as having enormous implications in other areas as well.”
There are more devices connected to the Internet than ever before and 2016 could be the year we see a significant rise in cyber attacks. New ways to interpret data will leverage predictive and proactive analytics to create new ways to fight security breaches.
February 1, 2016
The article on Fortune titled Has Big Data Gone Mainstream? asks whether big data is now an expected part of data analysis. The “merger” as Deloitte advisor Tom Davenport puts it, makes big data an indistinguishable aspect of data crunching. Only a few years ago, it was a scary buzzword that executives scrambled to understand and few experts specialized in. The article shows what has changed lately,
“Now, however, universities offer specialized master’s degrees for advanced data analytics and companies are creating their own in-house programs to train talent in data science. The Deloitte report cites networking giant Cisco CSCO -4.22% as an example of a company that created an internal data science training program that over 200 employees have gone through. Because of media reports, consulting services, and analysts talking up “big data,” people now generally understand what big data means…”
Davenport sums up the trend nicely with the statement that people are tired of reading about big data and ready to “do it.” So what will replace big data as the current mysterious buzzword that irks laypeople and the C-suite simultaneously? The article suggests “cognitive computing” or computer systems using artificial intelligence for speech recognition, object identification, and machine learning. Buzz, buzz!
Chelsea Kerwin, February 1, 2016
January 31, 2016
I read “8 Ways IBM Watson Analytics Is Transforming Business.” My initial reaction was, “If that were true, why is IBM stuck in a revenue decline.” IBM itself should be the exemplary case for the efficacy of IBM Watson.
IBM is struggling. I think the company has reported 15 consecutive quarters of revenue decline. Let’s see. Yes, that works out to four years of downhill sledding.
The write up ignores the obvious disconnect between what IBM asserts Watson can do and IBM’s own business performance. The reality is that if Watson were so darned wonderful, IBM’s financial results should reflect that insider advantage.
Here’s the part of the write up I highlighted with my Big Blue red ink marker:
- A Kentucky truck company is racking in the dough via Watson Analytics. Okay.
- A company engaged in social housing and health care is figuring out how not to injure workers. Okay.
- An outfit is identifying opportunities on the Australian stock exchange. I assume Watson is recommending IBM as a strong buy.
- A franchised patient taxi service is analyzing data from its transport services. But where’s Uber? What is Uber using for analytics? Okay.
- A marketing outfit in Texas takes time out from standing on line at Franklin Barbecue to correlate data. Okay but I think Franklin’s figures out customer demand by looking out the window of the restaurant.
- A hospitality planning service firm for college sports can figure out what to do when selling yummy hot dogs and serving cold, refreshing buttermilk to thirsty sports fans. Okay.
- A university (yes, a university with a statistics department) uses Watson to figure out how “to leverage social sentiment.” I wonder if the university queries graduates about their student loans versus employment prospects? Okay. Well, maybe not okay.
- Another university uses Watson in its actual classes. What about IBM SPSS? Wait maybe that’s Watson analytics. Students will be almost excited as I was to do the statistics exercises, but I did not get to use Watson. I had to use a pencil and paper.
My take on this article? IBM does not have compelling use cases. In fact, these examples illustrate that IBM is struggling to dress up analytics in marketing finery.
Uber? What’s Uber using for its ride analytics?
Stephen E Arnold, January 31, 2016
January 22, 2016
One of the best things about data and numbers is that they do not lie…usually. According to Slate’s article, “FTC Report Details How Big Data Can Discriminate Against The Poor,” big data does a huge disservice to people of lower socioeconomic status by reinforcing existing negative patterns. The Federal Trade Commission (FTC), academics, and activists have expressed for some time that big data analytics.
“At its worst, big data can reinforce—and perhaps even amplify—existing disparities, partly because predictive technologies tend to recycle existing patterns instead of creating new openings. They can be especially dangerous when they inform decisions about people’s access to healthcare, credit, housing, and more. For instance, some data suggests that those who live close to their workplaces are likely to maintain their employment for longer. If companies decided to take that into account when hiring, it could be accidentally discriminatory because of the radicalized makeup of some neighborhoods.”
The FTC stresses that big data analytics has positive benefits as well. It can yield information that can create more job opportunities, transform health care delivery, give credit through “non-traditional methods, and more.
The way big data can avoid reinforcing these problems and even improve upon them is to include biases from the beginning. Large data sets can make these problems invisible or even harder to recognize. Companies can use prejudiced data to justify the actions they take and even weaken the effectiveness of consumer choice.
Data is supposed to be an objective tool, but the sources behind the data can be questionable. It becomes important for third parties and the companies themselves to investigate the data sources, run multiple tests, and confirm that the data is truly objective. Otherwise we will be dealing with social problems and more reinforced by bad data.