Tech Is Not Elite
July 7, 2018
The top one percent is a silly notion. Tech is not elite. Only some technologists are elite. Proof you ask. Navigate to “Mark Zuckerberg Tops Warren Buffett to Become the World’s Third-Richest Person.” The write up points out:
Zuckerberg, who trails only Amazon.com Inc. founder Jeff Bezos and Microsoft Corp. co-founder Bill Gates, eclipsed Buffett Friday as Facebook shares climbed 2.4 percent, according to the Bloomberg Billionaires Index.
Generalizations are not necessarily accurate.
Stephen E Arnold, July 7 2018
Upside to AI If You Have a Knowledge Job
July 6, 2018
I read “AI Technology Frees People from Repetitive Mental Work.” Those in the top one percent will find the message inspirational. If one is not in that elite group, the future may be filled with cat videos and less productive uses of one’s time.
The write up states:
AI technology can free people from repetitive, inefficient and heavy mental work.
The person offering this insight is Li Yanhong (aka Robin Li), who is the founder of Baidu. For some, he is a combination of Sergey Brin and Mark Zuckerberg. He is a standard bearer for Chinese innovation, innovation that is positioned as the world leader in new, bright, shiny things.
AI is going to contribute to manufacturing, retail, and that old Watson chestnut, medicine.
What’s interesting is that Baidu has its own AI chip, the Kunlun. Plus, unlike US outfits, Baidu has self driving buses which allegedly do not kill pedestrians or drive into concrete barriers.
That’s nifty.
I would like to share one thought which crossed my mind.
The AI revolution may trigger some push back from those who have little to do during the day. My recollection is that there is some overt agitation in France, Spain, and other European countries. I have heard that there are some complainers in the US as well. Protesting or publishing anti government blog posts in some countries is not a path to personal success.
What’s AI’s contribution to this state of affairs?
Quite a few people will have time to do non repetitive tasks and think up ways to fill their time, give their life purpose, and engage in interesting pursuits.
For the one percent, no problems. Wait. One example just crossed my mind: A Googler with some extra time. See this story, which I assume is semi accurate.
Maybe it is just 20 percent of the top one percent who might evade the benefits of AI?
Stephen E Arnold, July 7, 2018
Online Memory: What Is Out There?
July 6, 2018
Facebook is an excellent company for most people. However, there are a handful of people who struggle to accept Facebook’s approach to reality. What happens when a chunk of digital memory becomes almost permanent?
The aftermath of the European Union’s “right to be forgotten” law that allows people to petition search engines and other data aggregators to delete search results on them permanently removed. While some believe this infringes on various forms of free speech, others believe this is a way for crime victims to reclaim their lives. Quartz shares how Google and Facebook are not the only Web companies being petitioned in the article, “Meet Profile Engine, The ‘Spammy’ Facebook Crawler Hated By People Who Want To Be Forgotten.”
According to the article, Google had the most Facebook results removed from its search engine, while the second most Web site to be requested to delete results is Profile Engine. Profile Engine started in 2007 and allows users to track down people on social network. It used to be a Facebook search engine, but the Profile Engine declared that Facebook was “spammy” and did not make truthful statements. Interesting assertion.
Profile Engine and Facebook had an argument, which resulted in a court battle. The two companies split, but Facebook is contractually obligated to keep feeding Profile Engine results. Facebook does not do this. In the meantime, Profile Engine stopped updated content around 2011. Facebook is not the only one that finds the Profile Engine interesting. There are many posts online about how to remove yourself from Profile Engine.
“Profile Engine is perhaps the worst of its kind, but not the only one that people across Europe are trying to expunge themselves from. Badoo, a London-based social network for meeting new people, had 2,206 results removed. Yasni—”News, pictures & links for any person. Find anyone on the internet with the world’s largest free people search”—had almost 3,000 results suppressed through its French and German subsidiaries. In other words, this battle of ownership of personal data is not going away anytime soon.”
Profile Engine was donated to the Internet Archive, so now all the results are located there. Effort may be needed to get information removed from the Internet Archive. It takes time and patience for Google to forget. Facebook type content may be almost permanent as well.
Whitney Grace, July 6, 2018
Useful AI Tools and Frameworks
July 6, 2018
We have found a useful resource: DZone shares “10 Open-Source Tools/Frameworks for Artificial Intelligence.” We do like open-source software. The write-up discusses the advantages offered by each entry in detail, so navigate there to compare and contrast the options. For example, regarding the popular TensorFlow, writer Somanath Veettil describes:
“TensorFlow is an open-source software library, which was originally developed by researchers and engineers working on the Google Brain Team. TensorFlow is for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow provides multiple APIs. The lowest level API — TensorFlow Core — provides you with complete programming control. The higher level APIs are built on top of TensorFlow Core. These higher level APIs are typically easier to learn and use than TensorFlow Core. In addition, the higher level APIs make repetitive tasks easier and more consistent between different users. A high-level API like tf.estimator helps you manage data sets, estimators, training, and inference. The central unit of data in TensorFlow is the tensor. A tensor consists of a set of primitive values shaped into an array of any number of dimensions. A tensor’s rank is its number of dimensions.”
The rest of Veettil’s entries are these: Apache SystemML, Caffe, Apache Mahout, OpenNN, Torch, Neuroph, Deeplearning4j (the “j” is for Java), Mycroft, and OpenCog. I note that several options employ a neural network, but approach that technology in different ways. It is nice to have so many choices for implementing AI; now the challenge is to determine which system is best for one’s particular needs. This list could help with that.
Cynthia Murrell, July 6, 2018
Why So Few Search Vendors Index the Web?
July 5, 2018
How many companies are indexing the Surface Web, the Dark Web, and the other bits and pieces which comprise the accessible Internet?
The answer is, “Not many most people can name.”
Another question, “Why don’t more companies just index the Internet?
The answer is, “Money, resources, time, expertise, and generating revenue.”
The write up from 2012 “How t Crawl a Quarter Billion Webpages in 40 Hours” surfaced again after an absence of six years. The article remains valid even thought the principal change in the last 72 months is the increased concentration of Google’s index. Microsoft, a company which insists that its Bing system, provides an alternative to Google has not significantly stopped Google’s market magnetism. Many of the systems which are marketed as Web indexes like Duckduckgo.com and Startpage.com are metasearch engines; that is, the users’ queries are passed to other services and may be supplemented with some original crawling. A bit of fiddling ensures that the results lists seem to be different. But there is a sameness to the result sets, particularly on popular queries. Yandex, the Russian Web search system, does a good job of handling certain sets of domains, but the overall coverage is not that different from what one can find in Google or its country centric indexes.
What’s interesting about “How to Crawl” from 2012 is the use of the Amazon system. This is important because the plumbing required to index the Internet can be large, complicated, and expensive.
Does Amazon still operate its A9 Web index? We have heard yes and no as an answer to this question. With a significant number of queries seeking product information, it makes sense to consider Amazon as a potential competitor to Bing, Google, and Yandex.
After rereading the “How to Crawl” paper, one thing jumps out. The notion that a quarter of a billion pages is a non trivial chunk of the Internet is interesting but a bit misleading. There may be upwards of more than 30 billion indexable Web pages. A large number of these content objects exist in mobile forms; thus, deduplication becomes an interesting issue. That’s why the Google has multiple indexes.
The big question becomes, “Is there another company able to compete with Google?”
After reading “How to Crawl” after a lapse of six years, the answer may be,
“Very, very few companies. And some of the outfits indexing the Surface and Hidden Internet may not make their activities public.”
Monocultures are okay but these can be vulnerable to something the monoculture cannot resist. Is Google like today’s banana? What happens if a blight attacks? One can shift to durian I suppose.
Stephen E Arnold, July 5, 2018
No University of Virginia Honor Code in Algeria
July 5, 2018
Anyone who has a child or young member of their family probably knows about the looming threat of the Internet on cheating. Whether it is the scourge of plagiarism on papers to using phones in class to lookup answers, there seems to be a runaway train in our schools and no way to stop it. Unless, of course, you live in Algeria. We learned more about their fascinating solution to this educational problem from a recent Science Alert story, “A Whole Country Just Turned Off Its Internet to Stop Students from Cheating on Exams.”
For six days, Algeria shut off the Internet so students could take their finals:
“It is of course a big step to take – but the country has a big problem with cheats. In 2016, some 300,000 students had to retake exams after papers were leaked early on the web and circulated around social media.
“Last year attempts were made to restrict access to social media platforms, but ultimately those measures weren’t effective enough – so this year the authorities are going all in. Both cell networks and broadband are getting switched off during the allotted periods.”
While it’s pretty extreme, we like Algeria’s moxy. This is a much more effective way to curb cheating, than say banning wristwatches. Could this method work in a country like America? We’re willing to bet that grownups can’t live without their cat memes long enough to find out.
Patrick Roland, July 5, 2018
Oracle Responds to Amazon: Another Data Marketplace
July 5, 2018
For years, companies have been creating ways to capture and analyze their own data, but the Big Data field has evolved—now one can purchase valuable data instead of scraping one’s own. SaaS leader Oracle has posted a handy guide to their own Data Marketplace in their Help Center, as a subsection of their “Using Oracle Data Cloud” description. The introduction tells us:
“The Oracle Data Marketplace is the world’s largest third-party data marketplace and the standard for open and transparent audience data trading. It provides an ecosystem built on premium quality data, flexible and fair pricing, and scale that is unmatched in the industry. The result is the most comprehensive access to quality data available to target audiences at any stage of the purchase funnel. Oracle Data Marketplace data providers offer more than 30,000 data attributes to power your branding or direct marketing initiatives and let you connect with your target audience anywhere on the internet.
We also noted this statement:
Access actionable audience data on more than 300 million users. That’s over 80% of the entire US internet population at your fingertips….Leverage a range of data to power in-market to business to demographic targeting, some of which are exclusive and not available anywhere else.
The goal is to sell data. Who buys data? We learned:
Eighty percent of the top 20 ad networks, portals, trading desks, and creative optimizers leverage data from the Oracle Data Marketplace platform to run high-performance ad campaigns.”
Oracle will offer more than 200 “data partner solutions.” The write up includes two charts that summarize the available data, one for Oracle’s BlueKai platform (Oracle acquired BlueKai in 2014) and one for “branded data” – data supplied by prominent third-party aggregators from AcquireWeb to Webbula. A few names I recognize in between are Experian, Forbes, MasterCard, and TiVo Research.
Does this sound similar to Amazon’s “streaming data marketplace”? We think it does.
Which company is better positioned to take business from the credit checking outfits like TransUnion? Which company has more consumer data? Which company has focused on real time analytics?
Our bet? Well, we aren’t the type of people who visit casinos.
Cynthia Murrell, July 5, 2018
Markov: Two Brothers and Chaining Hope to a Single Method for Efficiency
July 4, 2018
I am no math guy. I am no Googler. I am just an old person related to a semi capable math person named V.I. Arnold. That Arnold knew of the Markov guys because those who assisted Kolmogorov sort of kept in touch with stochastic methods.
This is recent news in math history. Andrey Andreyvich Markov died in 1922 when my uncle was a very young math prodigy. His brother Vladimir died in 1897.
Who cares?
I do sort of.
I read “Can Markov Logic Take Machine Learning to the Next Level?” From my point of view, the short answer is, “Not really.”
Machine learning requires a number of numerical recipes. Truth be told, most of these methods have been around a long time. The methods are taught by university profs and even discussed in IBM sales engineers’ briefings. (Yep, at least they were once upon a time.)
The write up explains Pedro Domingos’ insight. The article does not make clear that Dr. Domingos’ work has influenced the Google smart software effort. In fact, Google has, like Amazon, deep affection for the University of Washington. Dr. Jeff Dean, I have heard, shares a warm spot in his heart for the university.
The write up presents some of Dr. Domingos’ insights about Markov and Markov logic.
The key point for me is that as useful as the Russian brothers’ ideas are, there is more to machine learning than a single approach.
In fact, I find this statement from the article interesting:
The productivity advantages of Markov Logic may be too great to ignore. A deep learning machine that takes tens of thousands of lines of code in a traditional language could be expressed with just a few Markov Logic formulas, Domingos says. “It’s not completely push-button. Markov Logic is not at that stage. There’s still the usual playing around with things you have to do,” he says. “But your productivity and how far you can get is just at a different level.”
A few formulas. Interesting idea. How will one explain what comes out of a machine learning process if regulations about transparency for smart software become a reality?
Those who want to understand what smart software does may have to become familiar with the work of the Markov guys. That’s probably unrealistic. Therefore, figuring out how machine intelligence works is likely to be a challenge.
Now let’s get that accuracy of facial recognition systems above the 75 percent level on University of Washington tests.
Stephen E Arnold, July 4, 2018
Google Cloud: Dissipating with a Chance for Unsettled Weather
July 4, 2018
I love Google. It’s relevant. I am not sure the folks at CNBC share my enthusiasm. Navigate to “Google Cloud’s COO Has Left after Less Than a Year.” To be exact, I think Diane Bryant, Google Cloud Chief Operating Officer, was a Googler for about 13 months. In Internet dog years, that a long time, is it not? Maybe not? Here’s a different employment number: Seven months.
I highlighted this passage:
Bryant’s hire was a win for the search giant’s cloud business, which is widely seen as No. 3 in the public cloud market, behind Amazon and Microsoft. As the relative newcomer in the space, Google Cloud’s challenge has been to prove its capabilities to large businesses, though Greene has said that there are no more “deal blockers” in the way of new contracts.
Fact, snark, digital corn beef hash?
I don’t know. I continue to wonder if Alphabet Google’s approach to management is going to allow the company to keep pace with and then surpass the Bezos buck machine.
I will be reviewing my Amazon research at the September Telestrategies ISS LE and intelligence conference in Washington, DC. I will focus on both management and technical tactics.
I am not sure there will be a reference to Google until I have a sense that it is managed for sustainable innovation, in the cloud and on the ground as it were.
Stephen E Arnold, July 4, 2018
Socially Dark: Communities in the Shade
July 4, 2018
Security-analysis firm Recorded Future demonstrates its capabilities in its recent blog post, “Dark Networks: Social Network Analysis of Dark Web Communities.” The write-up describes the methodology researchers used to mine social network data for clues to Dark Web social circles, so navigate there for the technical details. Data engineer Adrian Tirados delineates the three clusters, or communities, they found:
*Low-Tier Underground Forums: Usually free and open-access forums, with many novice members. Higher-Tier Dark Web Forums: The access is generally restricted through things like strict membership vetting, only hosting the site on Tor, or other requirements for access. Members of these sites are experienced and regarded as reputable by other members of the criminal community. Rippers (members that scam other members without delivering a good or service) are scarce, and rigorous banning is enforced in order to protect the community. Dark Web Markets: Market sites with listings of illicit services and goods, stolen credentials, credit card dumps, etc. The access is usually open, meaning that they do not require an existing member to vouch for new registrants. The presence of edges between the two forum clusters versus the almost complete disconnection of the market cluster shows that there is a greater division between forums and markets than there is between low-tier and higher-tier forums.”
The piece goes on to posit upon the difference, suggesting high-tier forum users visit lower-tier forums for information and self-promotion. Those visiting marketplaces seem uninterested in what the forums have to offer (though, of course, they could be checking them out under different names, Tirados allows). Launched in 2009, Recorded Future is headquartered in Somerville, Massachusetts, with offices in London; Washington, D.C.; and Göteborg, Sweden. They are also hiring as of this writing, in case any readers are interested.
Cynthia Murrell, July 4, 2018