Hewlett Packard Makes Haven Commercially Available

July 19, 2016

The article InformationWeek titled HPE’s Machine Learning APIs, MIT’s Sports Analytics Trends: Big Data Roundup analyzes Haven OnDemand, a large part of Hewlett Packard Enterprise’s big data strategy. For a look at the smart software coming out of HP Enterprise, check out this video. The article states,

“HPE’s announcement this week brings HPE Haven OnDemand as a service on the Microsoft Azure platform and provides more than 60 APIs and services that deliver deep learning analytics on a wide range of data, including text, audio, image, social, Web, and video. Customers can start with a freemium service that enables development and testing for free, and grow into a usage and SLA-based commercial model for enterprises.”

You may notice from the video that the visualizations look a great deal like Autonomy IDOL’s visualizations from the early 2000s. That is, dated, especially when compared to visualizations from other firms. But Idol may have a new name: Haven. According to the article, that name is actually a relaxed acronym for Hadoop, Autonomy IDOL, HP Vertica, Enterprise Security Products, and “n” or infinite applications. HPE promises that this cloud platform with machine learning APIs will assist companies in growing mobile and enterprise applications. The question is, “Can 1990s technology provide what 2016 managers expects?”

 

Chelsea Kerwin, July 19, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

There is a Louisville, Kentucky Hidden Web/Dark
Web meet up on July 26, 2016.
Information is at this link: http://bit.ly/29tVKpx.

The Machine Learning Textbook

July 19, 2016

Deep learning is another bit of technical jargon floating around and it is tied to artificial intelligence.  We know that artificial intelligence is the process of replicating human thought patterns and actions through computer software.  Deep learning is…well, what specifically?  To get a primer on what deep learning is as well as it’s many applications check out “Deep Learning: An MIT Press Book” by Ian Goodfellow, Yoshua Bengio, and Aaron Courville.

Here is how the Deeping Learning book is described:

“The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine learning in general and deep learning in particular. The online version of the book is now complete and will remain available online for free. The print version will be available for sale soon.”

This is a fantastic resource to take advantage of.  MIT is one of the leading technical schools in the nation, if not the world, and the information that is sponsored by them is more than guaranteed to round out your deep learning foundation.  Also it is free, which cannot be beaten.  Here is how the book explains the goal of machine learning:

“This book is about a solution to these more intuitive problems.  This solution is to allow computers to learn from experience and understand the world in terms of a hierarchy of concepts, with each concept de?ned in terms of its relation to simpler concepts. By gathering knowledge from experience, this approach avoids the need for human operators to formally specify all of the knowledge that the computer needs.”

If you have time take a detour and read the book, or if you want to save time there is always Wikipedia.

 

Whitney Grace, July 19, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

There is a Louisville, Kentucky Hidden Web/Dark
Web meet up on July 26, 2016.
Information is at this link: http://bit.ly/29tVKpx.

 

Machine Learning: Learn Now

July 18, 2016

If you want the basics taught in most universities, you can start with the papers listed at this link. If you come away from these write ups with some questions, you can refresh your knowledge of Bayesian machine learning in a paper of the same name. To get a sense of some limitations of the much-hyped “new” approach to smart software, check out this sort of slideshow, sort of lecture called “What’s Wrong with Deep Learning?” Balanced views are difficult to track down. There are the cheerleaders, and then there are some implementers. A representative example of cheerleaders are the ad hoc team of Google, Microsoft, and some startups, research computing outfits, and lots of academics. The doubters are old people like myself who have had to deal with the interesting “drift” which can creep into deep learning systems. What’s drift, you may ask? Well, you expect one thing and get another. No human knows why. That’s drift.

Stephen E Arnold, July 18, 2016

The Dog Ate My Homework: The Google Approach

July 18, 2016

i read “Google Given Extra Six Weeks to Sort Its Act Out in EU Android Antitrust Probe.” When I was in college, I was annoyed when students missed the professor’s deadline. My approach was to manage my time within the boundaries set by the person who was “teaching” me. I taught (believe it or not) for a short time while I was working on my PhD in some obscure subject area related to medieval poetry and heard some interesting excuses from deadline misses; for example:

  • The dog tore up my report.
  • My mother was robbed.
  • It was raining and I did not want my homework to get wet.

Wonderful and somewhat entertaining.

The Google variation, according to the write up:

“Google asked for additional time to review the documents in the case file.”

Ah, slow readers challenged by time management.

Several questions flashed through my mind:

  • Will Google ask for additional delays, biding its time until the EU implodes?
  • Will Google show up and then have its legal eagles engages in swoops and dives to divert the legal air flows?
  • Will Google just continue along its path knowing that everyone in the EU uses Google services so the legal dust up is Sturm und Drang with some political laser lights flashing?

Worth watching.

Stephen E Arnold, July 18, 2016

Attivio Targets Profitability by the End of 2016 Through $31M Financing Round

July 18, 2016

The article on VentureBeat titled Attivio Raises $31 Million to Help Companies Make Sense of Big Data discusses the promises of profitability that Attivio has made since its inception in 2007. According to Crunchbase, the search vendor has raised over $100 million from four investors. In March 2016, the company closed a financing round at $31M with the expectation of becoming profitable within 2016. The article explains,

“Our increased investment underscores our belief that Attivio has game-changing capabilities for enterprises that have yet to unlock the full value of Big Data,” said Oak Investment Partners’ managing partner, Edward F. Glassmeyer. Attivio also highlighted such recent business victories as landing lab equipment maker Thermo Fisher Scientific as a client and partnering with medical informatics shop PerkinElmer. Oak Investment Partners, General Electric Pension Trust, and Tenth Avenue Holdings participated in the investment, which pushed Attivio’s funding to at least $102 million.”

In the VentureBeat Profile about the deal, Stephen Baker, CEO of Attivio makes it clear that 2015 was a turning point for the company, or in his words, “a watershed year.” Attivio prides itself on both speeding up the data preparation process and empowering their customers to “achieve true Data Dexterity.”  And hopefully they will also be profitable, soon.

 

Chelsea Kerwin, July 18, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

There is a Louisville, Kentucky Hidden Web/Dark
Web meet up on July 26, 2016.
Information is at this link: http://bit.ly/29tVKpx.

==

The Web, the Deep Web, and the Dark Web

July 18, 2016

If it was not a challenge enough trying to understand how the Internet works and avoiding identity theft, try carving through the various layers of the Internet such as the Deep Web and the Dark Web.  It gets confusing, but “Big Data And The Deep, Dark Web” from Data Informed clears up some of the clouds that darken Internet browsing.

The differences between the three are not that difficult to understand once they are spelled out.  The Web is the part of the Internet that we use daily to check our email, read the news, check social media sites, etc.  The Deep Web is an Internet sector not readily picked up by search engines.  These include password protected sites, very specific information like booking a flight with particular airline on a certain date, and the TOR servers that allow users to browse anonymously.  The Dark Web are Web pages that are not indexed by search engines and sell illegal goods and services.

“We do not know everything about the Dark Web, much less the extent of its reach.

“What we do know is that the deep web has between 400 and 550 times more public information than the surface web. More than 200,000 deep web sites currently exist. Together, the 60 largest deep web sites contain around 750 terabytes of data, surpassing the size of the entire surface web by 40 times. Compared with the few billion individual documents on the surface web, 550 billion individual documents can be found on the deep web. A total of 95 percent of the deep web is publically accessible, meaning no fees or subscriptions.”

The biggest seller on the Dark Web is child pornography.  Most of the transactions take place using BitCoin with an estimated $56,000 in daily sales.  Criminals are not the only ones who use the Dark Web, whistle-blowers, journalists, and security organizations use it as well.  Big data has not even scratched the surface related to mining, but those interested can find information and do their own mining with a little digging

 

Whitney Grace,  July 18 , 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

There is a Louisville, Kentucky Hidden Web/Dark
Web meet up on July 26, 2016.
Information is at this link: http://bit.ly/29tVKpx.

Elasticsearch API Calls

July 17, 2016

Short honk: Are you a fan of Elasticsearch, the Lucene based open source system giving proprietary vendors of search systems a migraine? If you are, you will want to point your browser at “Elasticsearch-API Info.” The information is presented in a table which lists and annotates Elasticsearch’s APIs from bulk to update. Useful stuff.

Stephen E Arnold, July 17, 2016

Google: Algorithms Are Objective

July 17, 2016

I know that Google’s algorithms are tireless, objective numerical recipes. However, “Google: Downranking Online Piracy Sites in Search Results Has Led to a 89% Decrease in Traffic” sparked in my mind the notion that human intervention may be influencing some search result rankings. I highlighted these statements in the write up:

“Google does not proactively remove hyperlinks to any content unless first notified by copyright holders, but the tech giant says that it is now processing copyright removal notices in less than six hours on average…” I assume this work is performed by objective algorithms.

“…it is happy to demote links to pages that explicitly contain or link to content that infringes copyright.” Again, a machine process and, therefore, objective?

Human intervention in high volume flows of information is often difficult. If Google is not using machine processes, perhaps the company is forced to group sites and then have humans make decisions.

Artificial intelligence, are you not up to the task?

Stephen E Arnold, July 21, 2016

Short Honk: Elassandra

July 16, 2016

Just a factoid. There is now a version of Elasticsearch which is integrated with Cassandra. You can get the code for version 2.1.1-14 via Github. Just another example of the diffusion of the Elastic search system.

Stephen E Arnold, July 16, 2016

Google Storage Lessons: Factoids with a Slice of Baloney

July 15, 2016

I read “Lessons To Learn From How Google Stores Its Data.” I noted a couple of interesting factoids (which I assume are spot on). The source is an “independent consultant and entrepreneur based out of Bangalore, India.”

The factoids:

  1. Google could be holding as much as 15 exabytes on their servers. That’s 15 million terrabytes [sic] of data which would be the equivalent of 30 million personal computers.
  2. “A typical database contains tables that perform specific tasks.”
  3. According to a paper published on the Google File System (GFS), the company duplicates each data indexed as many as three times. What this means is that if there are 20 petabytes of data indexed each day, Google will need to store as much as 60 petabytes of data.

As you digest these factoids, keep in mind the spelling issues, the obvious, and the reference to a decade old Google article.

Now the baloney. Google keeps it code in one big thing. Google scatters other data hither and yon. Google struggles to retrieve specific items from its helter skelter set up when asked to provide something to a person with a legitimate request.

In short, Google is like other large companies wrestling with new, old, and changed data. The difference is that Google has the money and almost enough staff to deal with the bumps in the information superhighway.

The Google sells online ads; it does not lead the world in each and every technology, including data management. Bummer, right?

Stephen E Arnold, July 15, 2016

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta