Short Honk: Alphabet Google and Health Investments
March 24, 2016
Short honk: This is an important article in my opinion. “Sergey Brin’s Search for a Parkinson’s Cure” reports that Mr. Brin exercises. He dives. I noted this passage:
With every dive, Brin gains a little bit of leverage—leverage against a risk, looming somewhere out there, that someday he may develop the neurodegenerative disorder Parkinson’s disease. Buried deep within each cell in Brin’s body—in a gene called LRRK2, which sits on the 12th chromosome—is a genetic mutation that has been associated with higher rates of Parkinson’s.
Also, I highlighted this passage:
It sounds so pragmatic, so obvious, that you can almost miss a striking fact: Many philanthropists have funded research into diseases they themselves have been diagnosed with. But Brin is likely the first who, based on a genetic test, began funding scientific research in the hope of escaping a disease in the first place.
A number of questions zipped through my mind. I won’t raise them. Perhaps the write up explains the “solving death” project and provides some insight into various Alphabet Google investments. In short, an article with information of some import to those who seek to understand the Alphabet Google thing.
Stephen E Arnold, March 24, 2016
Confused about Hadoop, Spark, and MapReduce? Not Necessary Now
March 24, 2016
I read “MapReduce vs. Apache Spark vs. SQL: Your questions answered here and at #StrataHadoop.” The article strikes at the heart of the Big Data boomlet. The options one has are rich, varied, and infused with consequences.
According to the write up:
Forester is predicting total market saturation for Hadoop in two years, and a growing number of users are leveraging Spark for its superior performance when compared to MapReduce.
Yikes! A mid tier consulting firm is predicting the future again. I almost stopped reading, but I was intrigued. Exactly what are the differences among these three systems, which appear to be, really different. MapReduce is a bit of a golden oldie, and there is the pesky thought in my mind that Hadoop is a close relative of MapReduce. The Spark thing is an open source effort to create a system which runs quickly enough to make performance mesh with the idea that engineers have weekends.
The write up states:
As I mentioned in my previous post, we’re using this blog series to introduce some of the key technologies SAS will be highlighting at Strata Hadoop World. Each Q&A features the thought leaders you’ll be able to meet when you stop by the SAS booth #1022. Next up is Brian Kinnebrew who explains how new enhancements to SAS Data Loader for Hadoop can support Spark.
Yikes, yikes. The write up is a plea for booth traffic. In the booth a visitor can learn about the Hadoop, Spark, and MapReduce options.
The most interesting thing about the article is that it presents a series of questions and some SAS-skewed answers. The point is that SAS, the statistics company every graduate student in psychology learns to love, has a Data Loader Version 2.4 which is going to make life wonderful for the Big Data crowd.
I wondered, “Is this extract, transform, and load” all over again?”
The answer is not to get tangled up in the substantive differences among Hadoop, Spark and MapReduce like the title of the article implied. The point is that one can use NoSQL and regular SQL.
So what did I learn about the differences among Hadoop, Spark, and MapReduce?
Nothing. Just content marketing without much content in my view.
SAS, let me know if you want me to explain the differences to someone in your organization.
Stephen E Arnold, March 24, 2016
Wikipedia Grants Users Better Search
March 24, 2016
Wikipedia is the defacto encyclopedia to confirm fact from fiction, although academic circles shun its use (however, scholars do use it but never cite it). Wikipedia does not usually make the news, unless it is tied to its fundraising campaign or Wikileaks releases sensitive information meant to remain confidential. The Register tells us that Wikipedia makes the news for another reason, “Reluctant Wikipedia Lifts Lid On $2.5m Internet Search Engine Project.” Wikipedia is better associated with the cataloging and dissemination of knowledge, but in order to use that knowledge it needs to be searched.
Perhaps that is why the Wikimedia Foundation is “doing a Google” and will be investing a Knight Foundation Grant into a search-related project. The Wikimedia Foundation finally released information about the Knight Foundation Grant, dedicated to provide funds for companies invested in innovative solutions related to information, community, media, and engagement.
“The grant provides seed money for stage one of the Knowledge Engine, described as “a system for discovering reliable and trustworthy information on the Internet”. It’s all about search and federation. The discovery stage includes an exploration of prototypes of future versions of Wikipedia.org which are “open channels” rather than an encyclopedia, analysing the query-to-content path, and embedding the Wikipedia Knowledge Engine ‘via carriers and Original Equipment Manufacturers’.”
The discovery stage will last twelve months, ending in August 2016. The biggest risk for the search project would be if Google or Yahoo decided to invest in something similar.
What is interesting is that former Wiki worker Jimmy Wales denied the Wikimedia Foundation was working on a search engine via the Knowledge Engine. Wales has since left and Andreas Kolbe reported in a Wikipedia Signpost article that they are building a search engine and led to believe it would be to find information spread cross the Wikipedia portals, rather it is something much more powerful.
Here is what the actual grant is funding:
“To advance new models for finding information by supporting stage one development of the Knowledge Engine by Wikipedia, a system for discovering reliable and trustworthy public information on the Internet.”
It sounds like a search engine that provides true and verifiable search results, which is what academic scholars have been after for years! Wow! Wikipedia might actually be worth a citation now.
Whitney Grace, March 24, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
VPN Disables Right to Be Forgotten for Users in European Union
March 24, 2016
Individuals in the European Union have been granted legal protection to request unwanted information about themselves be removed from search engines. An article from Wired, In Europe,You’ll Need a VPN to See Real Google Search Results, explains the latest on the European Union’s “right to be forgotten” laws. Formerly, privacy requests would only scrub sites with European country extensions like .fr, but now Google.com will filter results for privacy for those with a European IP address. However, European users can rely on a VPN to enable their location to appear as if it were from elsewhere. The article offers context and insight,
“China has long had its “Great Firewall,” and countries like Russia and Brazil have tried to build their own barriers to the outside ‘net in recent years. These walls have always been quite porous thanks to VPNs. The only way to stop it would be for Google to simply stop allowing people to access its search engine via a VPN. That seems unlikely. But with Netflix leading the way in blocking access via VPNs, the Internet may yet fracture and localize.”
The demand for browsing the web using surreptitious methods, VPN or otherwise, only seems to be increasing. Whether motivations are to uncover personal information about certain individuals, watch Netflix content available in other countries or use forums on the Dark Web, the landscape of search appears to be changing in a major way.
Megan Feil, March 24, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
DeepGram: Audio Search in Lectures and Podcasts
March 23, 2016
I read “DeepGram Lets You Search through Lectures and Podcasts for Your Favorite Quotes.” I don’t think the system is available at this time. The article states:
Search engines make it easy to look through text files for specific words, but finding phrases and keywords in audio and video recordings could be a hassle. Fortunately, California-based startup DeepGram is working on a tool that will make this process simpler.
The hint is the “is working.” Not surprisingly, the system is infused with artificial intelligence. The process is to covert speech to text and then index the result.
Exalead had an interesting system seven or eight years ago. I am not sure what happened to that demonstration. My recollection is that the challenge is to have sufficient processing power to handle the volume of audio and video content available for indexing.
When an outfit like Google is not able to pull off a comprehensive search system for its audio and video content, my hunch is that the task for a robust volume of content might be a challenge.
But if there is sufficient money, engineering talent, and processing power, perhaps I will no longer have to watch serial videos and listen to lousy audio to figure out what some folks are trying to communicate in their presentations.
Stephen E Arnold, March 23, 2016
Amazon Web Services: Crushing the Competition?
March 23, 2016
I read “Attack! Run. WTF? A Decade of Enterprise Class Fear and Uncertainty with AWS.” I am not sure if Amazon’s Web Services’ business is being praised or criticized. Nevertheless, the write up has some interesting factoids. I highlighted these statements:
IBM’s Cloud Services
- IBM, … was so flabbergasted [when Amazon won a US government contract] that the Blue Shirts of Armonk decided on the old-school route to victory and filed a legal complaint asking the government to re-evaluate IBM’s deal against that of Amazon, which Big Blue later withdrew.
- Famed for re-inventing itself around software in the 1990s under Lou Gerstner, the majority of IBM’s focus for the 2000s was devoted to unloading the PC and the server businesses on China. The firm is now trapped in a maelstrom of transition, restructuring and layoffs. Like Microsoft, IBM seems to have believed AWS couldn’t happen to it, that what the world needed was the same server software and services. It was nearly seven years after AWS that IBM realized something was afoot – probably when it lost both the CIA deal and got slapped about its attempts to make the CIA love it – that Big Blue said it would spend $2bn buying computing player SoftLayer and in 2014 throw $1.2bn into a massive data centre expansion to host your data and compute.
Microsoft Cloud Services
- Azure succumbed to classic innovator’s dilemma: how to sell a new platform as a package and at a price to maximize revenue without cannibalizing the company’s actual main money-makers – PC and server software. After delayed starts under Ray Ozzie and Bob Muglia, the technology roadmap only really clicked under new CEO Satya Nadella and executive software nerd Scott Guthrie. One brought the CEO-level commitment, the other made Azure work for developers.
- Gartner today regards Azure as number two, behind AWS, and yet… According to Gartner’s incumbent Cloud Queen Lydia Leong, Azure lacks the polish of AWS.
Oracle Cloud Services
- Oracle, which bought Sun, preferred to play a Game of Thrones that was corporate M&A to hold onto its position in IT. Sadly, it chose wrong; Oracle spent $8.5bn on Sun but ultimately discontinued the company’s fledgling utility computing service. Hardware and Java was what Oracle wanted.
- Today, Oracle’s resultant hardware business makes just half the revenue of AWS and is is shrinking – falling 13 per cent to $1.1bn – versus AWS’s 69 per cent growth last quarter to $2.4bn. That past complacency of Oracle’s CEO on cloud has put Oracle firmly in a pack of also rans behind AWS on platform cloud, with Oracle now throwing PR at a problem to convince Wall St it is credible as a provider of IT as a service.
And what about Amazon? The write up points out:
- AWS is still attacking – growing at a phenomenal rate, 71 per cent in its recent quarter to $2.4bn and 69 per cent for the year to $7.88bn. The appetite among enterprises for AWS’s style of technology and model of delivery clearly hasn’t yet been satiated.
- …the truth is AWS now has its fences across so much of the cloud, removing them isn’t an option. The big question then for AWS at the age of 10 is this: when will the old men of IT regain their wind? How big will be their counter-attack and will it be concerted? Will it pose a tangible threat and how would AWS respond?
I noted that Apple has shifted some of its cloud business to the Google from AWS. I assume the Board of Directors’ excitement is now behind the kids from Cupertino. What’s clear is that IBM and Oracle seem to face an uphill slog if I understand the write up. Read the original and decide for yourself. I love the WTF. Some stakeholders may be asking this question too.
Stephen E Arnold, March 23, 2016
Weekly Watson: IBM Watson Has a Sister
March 23, 2016
I read “In Africa, Watson’s Sister Lucy Is Growing Up with the Help of IBM’s Research Team.” I did not know that. According to the write up:
Lucy, named after the fossil ancestor Australopithecus afrarensis, is more of a system than a sci-fi super machine. “Lucy is many things, but it’s not just one talking computer in a room,” said Dr. Kamal Bhattacharya, Director of IBM Research–Africa. “We are using Watson related technology and big data analytics to develop solutions to African problems.”
I have been to different countries in Africa a handful of times. I have seen some of problems first hand. I learned from the description of Lucy, brother of IBM Watson that:
On the execution side, IBM Research Africa has launched problem solving groups around issues such as education, infrastructure, health care, and economic inclusion. Partners include African universities, telcos, hospitals, tech startups, and the Kenyan ICT Authority.
Research is good. Research which helps people is good. My concern is that IBM remains mired in years of revenue challenges. Marketing, not generating benefits for its stakeholders, seems to be a core IBM Watson competency. Also, the company is improving its ability to terminate unneeded employees. Lucy, what’s the fix for declining IBM revenues?
I await word from Watson’s sister?
Stephen E Arnold, March 23, 2016
The Dark Web Cuts the Violence
March 23, 2016
Drug dealing is a shady business that takes place in a nefarious underground and runs discreetly under our noses. Along with drug dealing comes a variety of violence involving guns, criminal offenses, and often death. Countless people have lost their lives related to drug dealing, and that does not even include the people who overdosed. Would you believe that the drug dealing violence is being curbed by the Dark Web? TechDirt reveals, “How The Dark Net Is Making Drug Purchases Safer By Eliminating Associated Violence And Improving Quality.”
The Dark Web is the Internet’s underbelly, where stolen information and sex trafficking victims are sold, terrorists mingle, and, of course, drugs are peddled. Who would have thought that the Dark Web would actually provide a beneficial service to society by sending drug dealers online and taking them off the streets? With the drug dealers goes the associated violence. There also appears to be a system of checks and balances, where drug users can leave feedback a la eBay. It pushes the drug quality up as well, but is that a good or bad thing?
“The new report comes from the European Monitoring Centre for Drugs and Drug Addiction, which is funded by the European Union, and, as usual, is accompanied by an official comment from the relevant EU commissioner. Unfortunately, Dimitris Avramopoulos, the European Commissioner for Migration, Home Affairs and Citizenship, trots out the usual unthinking reaction to drug sales that has made the long-running and totally futile “war on drugs” one of the most destructive and counterproductive policies ever devised:
‘We should stop the abuse of the Internet by those wanting to turn it into a drug market. Technology is offering fresh opportunities for law enforcement to tackle online drug markets and reduce threats to public health. Let us seize these opportunities to attack the problem head-on and reduce drug supply online.’”
The war on drugs is a futile fight, but illegal substances do not benefit anyone. While it is a boon to society for the crime to be taken off the streets, take into consideration that the Dark Web is also a breeding ground for crimes arguably worse than drug dealing.
Whitney Grace, March 23, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Stanford Offers Course Overviewing Roots of the Google Algorithm
March 23, 2016
The course syllabus for Stanford’s Computer Science class titled CS 349: Data Mining, Search, and the World Wide Web on Stanford.edu provides an overview of some of the technologies and advances that led to Google search. The syllabus states,
“There has been a close collaboration between the Data Mining Group (MIDAS) and the Digital Libraries Group at Stanford in the area of Web research. It has culminated in the WebBase project whose aims are to maintain a local copy of the World Wide Web (or at least a substantial portion thereof) and to use it as a research tool for information retrieval, data mining, and other applications. This has led to the development of the PageRank algorithm, the Google search engine…”
The syllabus alone offers some extremely useful insights that could help students and laypeople understand the roots of Google search. Key inclusions are the Digital Equipment Corporation (DEC) and PageRank, the algorithm named for Larry Page that enabled Google to become Google. The algorithm ranks web pages based on how many other websites link to them. John Kleinburg also played a key role by realizing that websites with lots of links (like a search engine) should also be seen as more important. The larger context of the course is data mining and information retrieval.
Chelsea Kerwin, March 23, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Interview with Stephen E Arnold, Reveals Insights about Content Processing
March 22, 2016
Nikola Danaylov of the Singularity Weblog interviewed technology and financial analyst Stephen E. Arnold on the latest episode of his podcast, Singularity 1 on 1. The interview, Stephen E. Arnold on Search Engines and Intelligence Gathering, offers thought-provoking ideas on important topics related to sectors — such as intelligence, enterprise search, and financial — which use indexing and content processing methods Arnold has worked with for over 50 years.
Arnold attributes the origins of his interest in technology to a programming challenge he sought and accepted from a computer science professor, outside of the realm of his college major of English. His focus on creating actionable software and his affinity for problem-solving of any nature led him to leave PhD work for a job with Halliburton Nuclear. His career includes employment at Booz, Allen & Hamilton, the Courier Journal & Louisville Times, and Ziff Communications, before starting ArnoldIT.com strategic information services in 1991. He co-founded and sold a search system to Lycos, Inc., worked with numerous organizations including several intelligence and enforcement organizations such as US Senate Police and General Services Administration, and authored seven books and monographs on search related topics.
With a continued emphasis on search technologies, Arnold began his blog, Beyond Search, in 2008 aiming to provide an independent source of “information about what I think are problems or misstatements related to online search and content processing.” Speaking to the relevance of the blog to his current interest in the intelligence sector of search, he asserts:
“Finding information is the core of the intelligence process. It’s absolutely essential to understand answering questions on point and so someone can do the job and that’s been the theme of Beyond Search.”
As Danaylov notes, the concept of search encompasses several areas where information discovery is key for one audience or another, whether counter-terrorism, commercial, or other purposes. Arnold agrees,
“It’s exactly the same as what the professor wanted to do in 1962. He had a collection of Latin sermons. The only way to find anything was to look at sermons on microfilm. Whether it is cell phone intercepts, geospatial data, processing YouTube videos uploaded from a specific IP address– exactly the same problem and process. The difficulty that exists is that today we need to process data in a range of file types and at much higher speeds than ever anticipated, but the processes remain the same.”
Arnold explains the iterative nature of his work:
“The proof of the value of the legacy is I don’t really do anything new, I just keep following these themes. The Dark Web Notebook is very logical. This is a new content domain. And if you’re an intelligence or information professional, you want to know, how do you make headway in that space.”
Describing his most recent book, Dark Web Notebook, Arnold calls it “a cookbook for an investigator to access information on the Dark Web.” This monograph includes profiles of little-known firms which perform high-value Dark Web indexing and follows a book he authored in 2015 called CYBEROSINT: Next Generation Information Access.