September 27, 2016
The article on GitHub titled The 9 Deep Learning Papers You Need To Know About (Understanding CNNs Part 3) is not an article about the global media giant but rather the advancements in computer vision and convolutional neural networks (CNNs). The article frames its discussion around the ImageNet Large-Scale Recognition Challenges (ILSVRC), what it terms the “annual Olympics of computer vision…where teams compete to see who has the best computer vision model for tasks such as classification, localization, detection and more.” The article explains that the 2012 winners and their network (AlexNet) revolutionized the field.
This was the first time a model performed so well on a historically difficult ImageNet dataset. Utilizing techniques that are still used today, such as data augmentation and dropout, this paper really illustrated the benefits of CNNs and backed them up with record breaking performance in the competition.
In 2013, CNNs flooded in, and ZF Net was the winner with an error rate of 11.2% (down from AlexNet’s 15.4%.) Prior to AlexNet though, the lowest error rate was 26.2%. The article also discusses other progress in general network architecture including VGG Net, which emphasized depth and simplicity of CNNs necessary to hierarchical data representation, and GoogLeNet, which tossed the deep and simple rule out of the window and paved the way for future creative structuring using the Inception model.
Chelsea Kerwin, September 27, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233599645/
September 15, 2016
I read “The CIA Just Invested in a Hot Startup That Makes Sense of Big Data.” I love the “just.” In-Q-Tel investments are not like bumping into a friend in Penn Station. Zoomdata, founded in 2012, has been making calls, raising venture funding (more than $45 million in four rounds from 21 investors), and staffing up to about 100 full time equivalents. With its headquarters in Reston, Virginia, the company is not exactly operating from a log cabin west of Paducah, Kentucky.
The write up explains:
Zoom Data uses something called Data Sharpening technology to deliver visual analytics from real-time or historical data. Instead of a user searching through an Excel file or creating a pivot table, Zoom Data puts what’s important into a custom dashboard so users can see what they need to know immediately.
What Zoomdata does is offer hope to its customers for less human fiddling with data and faster outputs of actionable intelligence. If you recall how IBM i2 and Palantir Gotham work, humans are needed. IBM even snagged Palantir’s jargon of AI for “augmented intelligence.”
In-Q-Tel wants more smart software with less dependence on expensive, hard to train, and often careless humans. When incoming rounds hit near a mobile operations center, it is possible to lose one’s train of thought.
Zoomdata has some Booz, Allen DNA, some MIT RNA, and protein from other essential chemicals.
The write up mentions Palantir, but does not make explicit the need to reduce t6o some degree the human-centric approaches which are part of the major systems’ core architecture. You have nifty cloud stuff, but you have less nifty humans in most mission critical work processes.
To speed up the outputs, software should be the answer. An investment in Zoomdata delivers three messages to me here in rural Kentucky:
- In-Q-Tel continues to look for ways to move along the “less wait and less weight” requirement of those involved in operations. “Weight” refers to heavy, old-fashioned system. “Wait” refers to the latency imposed by manual processes.
- Zoomdata and other investments whips to the flanks of the BAE Systems, IBMs, and Palantirs chasing government contracts. The investment focuses attention not on scope changes but on figuring out how to deal with the unacceptable complexity and latency of many existing systems.
- In-Q-Tel has upped the value of Zoomdata. With consolidation in the commercial intelligence business rolling along at NASCAR speeds, it won’t take long before Zoomdata finds itself going to big company meetings to learn what the true costs of being acquired are.
For more information about Zoomdata, check out the paid-for reports at this link.
Stephen E Arnold, September 15, 2016
September 13, 2016
I was surprised by the information presented in “SAP Hana Implementation Pattern Research Yields Contradictory Results.” My goodness, I thought, an online publication actually presents some ideas that a high profile system may not be a cat fully dressed in pajamas.
The SAP Hana system is a database. The difference between Hana and the dozens of other allegedly next generation data management solutions is its “in memory, columnar database platform.” If you are not hip to the lingo of the database administrators who clutch many organizations by the throat, an in memory approach is faster than trucking back to a storage device. Think back to the 1990s and Eric Brewer or the teens who rolled out Pinpoint.
The columnar angle is that data is presented in stacks with each item written on a note card. The mapping of the data is different from a row type system. The primary key in a columnar structure is the data, which maps back to the the row identification.
The aforementioned article points to a mid tier consulting firm report. That report by an outfit called Nucleus Research. Nucleus, according to the article, “revealed that 60 percent of SAP reference customers – mostly in the US – would not buy SAP technology again.” I understand that SAP engenders some excitement among its customers, but a mid tier consulting firm seems to be demonstrating considerable bravery if the data are accurate. Many mid tier consulting firms sand the rough edges off their reports.
The article then jumps to a report paid for by an SAP reseller, which obviously has a dog in the Nucleus fight. Another mid tier research outfit called Coleman Parks was hired to do another study. The research focused on 250 Hana license holders.
The results are interesting. I learned from the write up:
When asked what claims for Hana were credible, 92% of respondents said it reduced IT infrastructure costs, a further 87% stated it saved business costs. Some 98% of Hana projects came in on-budget, and 65% yet to roll out were confident of hitting budget.
Yep, happy campers who are using the system for online transactional processing and online analytical processing. No at home chefs tucking away their favorite recipes in Hana I surmise.
However, the report allegedly determined what I have known for more than a decade:
SAP technology is often deemed too complex, and its CEO Bill McDermott has been waging a public war against this complexity for the past few years, using the mantra Run Simple.
The rebuttal study identified another plus for Hana:
“We were surprised how satisfied the Hana license holders were. SAP has done a good job in making sure these projects work, and rate at which has got Hana out is amazing for such a large organization,” said Centiq director of technology and services Robin Webster. “We had heard a lot about Hana as shelfware, so we were surprised at the number saying they were live.”
From our Hana free environment in rural Kentucky, we think:
- Mid tier consulting firms often output contradictory findings when reviewing products or conducting research. If there is bias in algorithms, imagine what might luck in the research team members’ approaches
- High profile outfits like SAP can convince some of the folks with dogs in the fight to get involved in proving that good things come to those who have more research conducted
- Open source data management systems are plentiful. Big outfits like Hewlett Packard, IBM, and Oracle find themselves trying to generate the type of revenue associated with proprietary, closed data management products at a time when fresh faced computer science graduates just love free in memory solutions like Memsql and similar solutions.
SAP mounted an open source initiative which I learned about in “SAP Embraces Open Source Sort Of.” But the real message for me is that one can get mid tier research firms to do reports. Then one can pick the one that best presents a happy face to potential licensees.
Here in Harrod’s Creek, the high tech crowd tests software before writing checks. No consultants required.
Stephen E Arnold, September 13, 2016
August 18, 2016
The article on Datamation titled 7 Reasons Why Free Software Is Losing Influence investigates some of the causes for the major slowdown in FOSS (free and open software software). The article lays much of the blame at the feet of the leader of the Free Software Foundation (FSF), Richard Stallman. In spite of his major contributions to the free software movement, he is prickly and occasionally drops Joe Biden-esque gaffes detrimental to his cause. He also has an issue when it comes to sticking to his message and making his cause relevant. The article explains,
“Over the last few years, Richard Stallman has denounced cloud computing, e-books, cell phones in general, and Android in particular. In each case, Stallman has raised issues of privacy and consumer rights that others all too often fail to mention. The trouble is, going on to ignore these new technologies solves nothing, and makes the free software movement more irrelevant in people’s lives. Many people are attracted to new technologies, and others are forced to use them because others are.”
In addition to Stallman’s difficult personality, which only accounts for a small part of the decline in the FSF’s influence, the article also has other suggestions. Perhaps most importantly, the FSF is a tiny company without the resources to achieve its numerous goals like sponsoring the GNU Project, promoting social activism, and running campaigns against DRM and Windows.
Chelsea Kerwin, August 18, 2016
There is a Louisville, Kentucky Hidden /Dark Web meet up on August 23, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233019199/
July 27, 2016
Salesforce.com is a cloud computing company with the majority of its profits coming from customer relationship management and acquiring commercial social networking apps. According to PC World, Salesforce recently had a blackout and the details were told in: “Salesforce Outage Continues In Some Parts Of The US.” In early May, Salesforce was down for over twelve hours due to a file integrity issue in the NA14 database.
The outage occurred in the morning with limited services restored later in the evening. Salesforce divides its customers into instances. The NA14 instance is located in North America as many of the customers who complained via Twitter are located in the US.
The exact details were:
“The database failure happened after “a successful site switch” of the NA14 instance “to resolve a service disruption that occurred between 00:47 to 02:39 UTC on May 10, 2016 due to a failure in the power distribution in the primary data center,” the company said. Later on Tuesday, Salesforce continued to report that users were still unable to access the service. It said it did not believe “at this point” that it would be able to repair the file integrity issue. Instead, it had shifted its focus to recovering from a prior backup, which had not been affected by the file integrity issues.”
It is to be expected that power outages like this would happen and they will reoccur in the future. Technology is only as reliable as the best circuit breaker and electricity flows. This is why it is recommended to back up your files in more than one place.
July 22, 2016
A company with a long history is getting fresh scrutiny. An article at Fortune reports, “This Little-Known Firm Is Getting Rich Off Your Medical Data.” Writer Adam Tanner informs us:
“A global company based in Danbury, Connecticut, IMS buys bulk data from pharmacy chains such as CVS , doctor’s electronic record systems such as Allscripts, claims from insurers such as Blue Cross Blue Shield and from others who handle your health information. The data is anonymized—stripped from the identifiers that identify individuals. In turn, IMS sells insights from its more than half a billion patient dossiers mainly to drug companies.
“So-called health care data mining is a growing market—and one largely dominated by IMS. Last week, the company reported 2015 net income of $417 million on revenue of $2.9 billion, compared with a loss of $189 million in 2014 (an acquisition also boosted revenue over the year). ‘The outlook for this business remains strong,’ CEO Ari Bousbib said in announcing the earnings.”
IMS Health dates back to the 1950s, when a medical ad man sought to make a buck on drug-sales marketing reports. In the 1980s and ‘90s, the company thrived selling profiles of specific doctors’ proscribing patterns to pharmaceutical marketing folks. Later, they moved into aggregating information on individual patients—anonymized, of course, in accordance with HIPAA rules.
Despite those rules, some are concerned about patient privacy. IMS does not disclose how it compiles their patient dossiers, and it may be possible that records could, somehow someday, become identifiable. One solution would be to allow patients to opt out of contributing their records to the collection, anonymized or not, as marketing data firm Acxiom began doing in 2013.
Of course, it isn’t quite so simple for the consumer. Each health record system makes its own decisions about data sharing, so opting out could require changing doctors. On the other hand, many of us have little choice in our insurance provider, and a lot of those firms also share patient information. Will IMS move toward transparency, or continue to keep patients in the dark about the paths of their own medical data?
Cynthia Murrell, July 22, 2016
There is a Louisville, Kentucky Hidden Web/Dark
Web meet up on July 26, 2016.
Information is at this link: http://bit.ly/29tVKpx.
July 20, 2016
The article titled An Intranet Success Story on BA Insight asserts that search is less about finding information than it is about user experience. In the context of Intranet networks and search, the article discusses what makes for an effective search engine. Nationwide Insurance, for example, forged a strong, award-winning intranet which was detailed in the article,
“Their “Find Anything” locator, navigation search bar, and extended refiners are all great examples of the proven patterns we preach at BA Insight…The focus for SPOT was clear. It’s expressed in three points: Simple consumer-like experience, One-stop shop for knowledge, Things to make our jobs easier… All three of these connect directly to search that actually works. The Nationwide project has generated clear, documented business results.”
The results include Engagement, Efficiency, and Cost Savings, in the form of $1.5M saved each year. What is most interesting about this article is the assumption that UX experience trumps search results, or at least, search results are merely one aspect of search, not the alpha and omega. Rather, providing an intuitive, user-friendly experience should be the target. For Nationwide, part of that targeting process included identifying user experience as a priority. SPOT, Nationwide’s social intranet, is built on Yammer and SharePoint, and it is still one of the few successful and engaging intranet platforms.
Chelsea Kerwin, July 20, 2016
There is a Louisville, Kentucky Hidden Web/Dark
Web meet up on July 26, 2016.
Information is at this link: http://bit.ly/29tVKpx.
July 16, 2016
Just a factoid. There is now a version of Elasticsearch which is integrated with Cassandra. You can get the code for version 2.1.1-14 via Github. Just another example of the diffusion of the Elastic search system.
Stephen E Arnold, July 16, 2016
July 15, 2016
I read “Lessons To Learn From How Google Stores Its Data.” I noted a couple of interesting factoids (which I assume are spot on). The source is an “independent consultant and entrepreneur based out of Bangalore, India.”
- Google could be holding as much as 15 exabytes on their servers. That’s 15 million terrabytes [sic] of data which would be the equivalent of 30 million personal computers.
- “A typical database contains tables that perform specific tasks.”
- According to a paper published on the Google File System (GFS), the company duplicates each data indexed as many as three times. What this means is that if there are 20 petabytes of data indexed each day, Google will need to store as much as 60 petabytes of data.
As you digest these factoids, keep in mind the spelling issues, the obvious, and the reference to a decade old Google article.
Now the baloney. Google keeps it code in one big thing. Google scatters other data hither and yon. Google struggles to retrieve specific items from its helter skelter set up when asked to provide something to a person with a legitimate request.
In short, Google is like other large companies wrestling with new, old, and changed data. The difference is that Google has the money and almost enough staff to deal with the bumps in the information superhighway.
The Google sells online ads; it does not lead the world in each and every technology, including data management. Bummer, right?
Stephen E Arnold, July 15, 2016
July 7, 2016
I was cruising through the outputs of my Overflight system and spotted a write up with the fetching title “Big Data Services | @CloudExpo #BigData #IoT #M2M #ML #InternetOfThings.” Unreadable? Nah. Just a somewhat interesting attempt to get a marketing write up indexed by a Web search engine. Unfortunately humans have to get involved at some point. Thus, in my quest to learn what the heck Big Data is, I explored the content of the write up. What the article presents is mini summaries of slide decks developed by assorted mavens, wizards, and experts. I dutifully viewed most of the information but tired quickly as I moved through a truly unusual article about a conference held in early June. I assume that the “news” is that the post conference publicity is going to provide me with high value information in exchange for the time I invested in trying to figure out what the heck the title means.
I viewed a slide deck from an outfit called Cazena. You can view “Tech Primer: Big Data in the Cloud.” I want to highlight this deck because it contains one of the most amazing diagrams I have seen in months. Here’s the image:
Not only is the diagram enhanced by the colors and lines, the world it depicts is a listing of data management products. The image was produced in June 2015 by a consulting firm and recycled in “Tech Primer” a year later.
I assume the folks in the audience benefited from the presentation of information from mid tier consulting firms. I concluded that the title of the article is actually pretty clear.
I wonder, Is a T shirt is available with the database graphic? If so, I want one. Perhaps I can search for the strings “#M2M #ML.”
Stephen E Arnold, July 7, 2016