Amazon Intelligence Gets a New Data Stream
June 28, 2018
I read “Amazon’s New Blue Crew.” The idea is that Amazon can disintermediate FedEx, UPS (the outfit with the double parking brown trucks), and the US Postal Service.
On the surface, the idea makes sense. Push down delivery to small outfits. Subsidize them indirectly and directly. Reduce costs and eliminate intermediaries not directly linked to Amazon.
FedEx, UPS, and the USPS are not the most nimble outfits around. I used to get FedEx envelopes every day or two. I haven’t seen one of those for months. Shipping vis UPS is a hassle. I fill out forms and have to manage odd slips of paper with arcane codes on them. The US Postal Services works well for letters, but I have noticed some returns for “addresses not found.” One was an address in the city in which I live. I put the letter in the recipient’s mailbox. That worked.
The write up reports:
The new program lets anyone run their own package delivery fleet of up to 40 vehicles with up to 100 employees. Amazon works with the entrepreneurs — referred to as “Delivery Service Partners” — and pays them to deliver packages while providing discounts on vehicles, uniforms, fuel, insurance, and more. They operate their own businesses and hire their own employees, though Amazon requires them to offer health care, paid time off, and competitive wages. Amazon said entrepreneurs can get started with as low as $10,000 and earn up to $300,000 annually in profit.
Now what’s the connection to Amazon streaming data services and the company’s intelligence efforts? Several hypotheses come to mind:
- Amazon obtains fine grained detail about purchases and delivery locations. These are data which no longer can be captured in a non Amazon delivery service system
- The data can be cross correlated; for example, purchasers of a Kindle title with the delivery of a particular product; for example, hydrogen peroxide
- Amazon’s delivery data make it possible to capture metadata about delivery time, whether a person accepted the package or if the package was left at the door and other location details such as a blocked entrance, for instance.
A few people dropping off packages is not particularly useful. Scale up the service across Amazon operations in the continental states or a broader swatch of territory and the delivery service becomes a useful source of high value information.
FedEx and UPS are ripe for disruption. But so is the streaming intelligence sector. Worth monitoring this ostensible common sense delivery play.
Stephen E Arnold, June 28, 2018
Health Care: Data an Issue
June 28, 2018
Healthcare analytics is helping doctors and patients make decisions in ways we never could have dreamed. From helping keep your heart healthy to deciding when to have major surgery, analytic numbers make a big impact. However, that data needs to be perfect in order to work, according a recent ZD Net story, “Google AI is Very Good at Predicting When a Patient is Going to Die.”
According to the story:
“As noted, 80 percent of the effort in creating an analytic model is in cleaning the data, so it could provide a way to scale up predictive models, assuming the data is available to mine…. “This technique would allow clinicians to check whether a prediction is based on credible facts and address concerns about so-called ‘black-box’ methods that don’t explain why a prediction has been made.”
This really illustrates how powerful clean data can be in the health field. However, cleaning data is just about the most misunderstood wallflower in the often tedious world of machine learning and data science—not just in healthcare. According to Entrepreneur magazine, the act of filling in blanks, removing outliers, and basically looking at all the data to make sure it will be accurate, is the most important part of the process and also the hardest role to fill on a team.
Garbage in, garbage out. True decades ago. True today. How do we know? Just ask one of IBM Watson’s former health care specialists. Querying patients who were on the wrong end of a smart output may be helpful as well.
Patrick Roland, June 28, 2018
Who Guesses Better: Humans or Smart Software
March 28, 2018
MBAs are likely to pay close attention to smart software which makes decisions about which start up or stock to back.
With all the hand wringing about how artificial intelligence is going to put a lot of people out of work and drastically change our future landscape, it’s almost as if commentators are making it a given that humans are inferior. These writers and thinkers don’t seem to have any faith that our brains can do the heavy lifting to. CNBC recently found a niche where maybe we simple men and women can keep up thanks to…research, of course. We learned more in the article, “Doing Your Homework Does Lead to Better Investing returns.”
According to the story:
“…sophisticated hedge-fund managers are simply more skilled at processing swaths of information and data, their advantage may be more in their ability to match private data with public disclosures and SEC filings. ‘We look at the people who do robotic downloading. The people who use it suggests that hedge funds are going out and that they’re getting public information whenever they need.’”
It’s a great angle, for sure. That with endless hours of research, our investments can turn to gold. However, this overlooks the idea that there may be flaws with the data itself. What if you are using biased info or downright bad data?
Perhaps the humans are better at picking winners than smart software. Data are not created equal. Smart software may incur a penalty because of flawed inputs. Bad data can cripple some data analytics outputs.
Net net, as the MBAs say, data have to be reliable. For now, bet on the human when it comes to deciding about investments.
Patrick Roland, March 28, 2018
Importance of Good Data to AI Widely Underappreciated
March 27, 2018
Reliance on AI has now become embedded in our culture, even as we struggle with issues of algorithmic bias and data-driven discrimination. Tech news site CIO reminds us, “AI’s Biggest Risk Factor: Data Gone Wrong.” In the detailed article, journalist Maria Korolov begins with some early examples of “AI gone bad” that have already occurred, and explains how this happens; hard-to-access data, biases lurking within training sets, and faked data are all concerns. So is building an effective team of data management workers who know what they are doing. Regarding the importance of good data, Korolov writes:
Ninety percent of AI is data logistics, says JJ Guy, CTO at Jask, an AI-based cybersecurity startup. All the major AI advances have been fueled by advances in data sets, he says. ‘The algorithms are easy and interesting, because they are clean, simple and discrete problems,’ he says. ‘Collecting, classifying and labeling datasets used to train the algorithms is the grunt work that’s difficult — especially datasets comprehensive enough to reflect the real world.’… However, companies often don’t realize the importance of good data until they have already started their AI projects. ‘Most organizations simply don’t recognize this as a problem,’ says Michele Goetz, an analyst at Forrester Research. ‘When asked about challenges expected with AI, having well curated collections of data for training AI was at the bottom of the list.’ According to a survey conducted by Forrester last year, only 17 percent of respondents say that their biggest challenge was that they didn’t ‘have a well-curated collection of that to train an AI system.’
Eliminating bias gleaned from training sets (like one AI’s conclusion that anyone who’s cooking must be a woman) is tricky, but certain measures could help. For example, tools that track how an algorithm came to a certain conclusion can help developers correct its impression. Also, independent auditors bring in a fresh perspective. These delicate concerns are part of why, says Korolov, AI companies are “taking it slow.” This is slow? We’d better hang on to our hats whenever (they decide) they’ve gotten a handle on these issues.
Cynthia Murrell, March 27, 2018
Million Short: A Metasearch Option
March 22, 2018
An interview at Forbes delves into the story behind Million Short, an alternative to Google for Internet Search. As concerns grow about online privacy, information accuracy, and filter bubbles, options that grant the user more control appeal to many. Contributor Julian Mitchell interviews Million Short founder and CEO Sanjay Arora in his piece, “This Search Engine Startup Helps You Find What Google Is Missing.” Mitchell informs us:
Founded in 2012, Million Short is an innovative search engine that takes a new and focused approach to organizing, accessing, and discovering data on the internet. The Toronto-based company aims to provide greater choices to users seeking information by magnifying the public’s access to data online. Cutting through the clutter of popular searches, most-viewed sites and sponsored suggestions, Million Short allows users to remove up to the top one million sites from the search set. Removing ‘an entire slice of the web’, the company hopes to balance the playing field for sites that may be new, suffer from poor SEO, have competitive keywords, or operate a small marketing budget. Million Short Founder and CEO Sanjay Arora shares the vision behind his company, overthrowing Google’s search engine monopoly, and his insight into the future of finding information online.
The subsequent interview gets into details, like Arora’s original motivation for creating Million Short—Search is too important to be dominated by a just few companies, he insists. The pair explores both advantages and challenges the company has seen, as well as a look to the future. See the article for more.
Cynthia Murrell, March 22, 2018
Reddit Turns to Bing for AI Prowess
March 19, 2018
This is an interesting development—MSPoweruser announces, “Reddit Partners with Microsoft and Bing for AI Tools.” We’d though Reddit was thrilled with Solr. Reddit CEO Alexis Ohanian announced the partnership at the Everyday AI event in San Francisco, saying his company required “AI heavy lifting” to analyze the incredible amounts of data it collects. For its part, Bing gets access to valuable data. Writer Surur tells us:
The partnership will benefit both parties with Reddit contributing content to Bing such as AMAs and advertising upcoming AMAs and Reddit Answers and Microsoft making subreddit content more visible in their search results. Now when searching for a subreddit in Bing it will deliver a live snapshot of the top threads in the subreddit. Ohanian noted that Reddit is the largest answer database of nuanced, verified answers, offering an amazing resource to Bing. He noted that the Bing partnership was like a crown jewel for Reddit and just scratches the surface of what is possible with Microsoft’s AI expertise and Reddit data. For companies who use Reddit for professional and commercial reasons, Reddit will be offering the Power BI suite of solution templates for brand management and targeting on Reddit which will enable brands, marketers, and budget owners to quickly analyze their Reddit footprint and determine how, where, and with whom to engage in the Reddit community.
With 330 million active monthly users, Reddit is about the same size as Twitter; that is indeed a lot of data. Surur points us to Reddit’s blog post on the subject for more information.
Cynthia Murrell, March 19, 2018
New SEO Predictions May Just Be Spot On
March 7, 2018
What will 2018 bring us? If the past twelve months were any indication, we have no idea what will hit next. However, that doesn’t stop the experts from trying to cash in on their Nostradamus abilities. Some of them actually sound pretty plausible, like Search Engine Journal article, “47 Experts on the Top SEO Trends For 2018.”
There are some real longshots on the list, but also some really insightful thoughts like:
In 2018 there will be an even bigger focus on machine learning and “SEO from data.” Of course, the amplification side of things will continue to integrate increasingly with genuine public relations exercises rather than shallow-relationship link building, which will become increasingly easy to detect by search engines.
Something which was troubling about 2017, and as we head into 2018, is the new wave of organizations merely bolting on SEO as a service without any real appreciation of structuring site architectures and content for both humans and search engine understanding. While social media is absolutely essential as a means of reaching influencers and disrupting a conversation to gain traction, grow trust and positive sentiment, those who do not take the time to learn about how information is extracted for search too may be disappointed.
We especially agree with how the importance of SEO will grow in the new year. Innovative organizations are finding amazing new ways to manipulate the data and we don’t expect that to stop. It’ll be interesting to see where we stand twelve months from now.
Patrick Roland, March 7, 2018
Universal Text Translation Is the Next Milestone for AI
February 9, 2018
As the globe gets smaller, individuals are in more contact with people who don’t speak their language. Or, we are reading information written in a foreign language. Programs like Google Translate are flawed at best and it is clear this is a niche waiting to be filled. With the increase of AI, it looks like that is about to happen, according to a recent GCN article, “IARPA Contracts for Universal Text Translator.”
According to the article:
The Intelligence Advanced Research Projects Activity is a step closer to developing a universal text translator that will eventually allow English speakers to search through multilanguage data sources — such as social media, newswires and press reports — and retrieve results in English.
The intelligence community’s research arm awarded research and performance monitoring contracts for its Machine Translation for English Retrieval of Information in Any Language program to teams headed by leading research universities paired with federal technology contractors.
Intelligence agencies, said IARPA project managers in a statement in late December, grapple with an increasingly multilingual, worldwide data pool to do their analytic work. Most of those languages, they said, have few or no automated tools for cross-language data mining.
This sounds like a very promising opportunity to get everyone speaking the same language. However, we think there is still a lot of room for error. We are hedging our bets on Unibabel’s AI translation software that is backed up by human editors. (They raised $23M, so they must be doing something right.) That human angle seems to be the hinge that will be a success for someone in this rich field.
Patrick Roland, February 9, 2018
Data Governance is the Hot Term in Tech Now
February 5, 2018
Data governance is a headache many tech companies have to juggle with. With all the advances in big data and search, how can we possibly make sense of this rush of information? Thankfully, there are new data governance advances that aim to help. We learned more from a recent Top Quadrant story, “How Does SHACL Support Data Governance.”
According to the story:
“SHACL (SHAPES Constraint Language) is a powerful, recently released W3C standard for data modeling, ontology design, data validation, inferencing and data transformation. In this post, we explore some important ways in which SHACL can be used to support capabilities needed for data governance.
Below, each business capability or value relevant to data governance is introduced with a brief description, followed by an explanation of how the capability is supported by SHACL, accompanied by a few specific examples from the use of SHACL in TopBraid Enterprise Data Governance.
So, governance is a great way for IT and business to communicate better and wade through the data. Others are starting to take notice and SHACL is not just the only solution. In fact, there are a wealth of options available, you just have to know where to look. Regardless, your business is going to have to take governance seriously and it’s better to start sooner than later.
Patrick Roland, February 5, 2018
Averaging Information Is Not Cutting It Anymore
January 16, 2018
Here is something interesting that comes after the headline of “People From Around The Globe Met For The First Flat Earth Conference” and beliefs that white supremacists are gaining more power. The Frontiers Media shares that, “Rescuing Collective Wisdom When The Average Group Opinion Is Wrong” is an article that pokes fun at the fanaticism running rampant in the news. Beyond the fanaticism in the news, there is a real concern with averaging when it comes to data science and other fields that heavily rely on data.
The article breaks down the different ways averaging is used and the different theorems that are developed from it. The introduction is a bit wordy but it sets the tone:
The total knowledge contained within a collective supersedes the knowledge of even its most intelligent member. Yet the collective knowledge will remain inaccessible to us unless we are able to find efficient knowledge aggregation methods that produce reliable decisions based on the behavior or opinions of the collective’s members. It is often stated that simple averaging of a pool of opinions is a good and in many cases the optimal way to extract knowledge from a crowd. The method of averaging has been applied to analysis of decision-making in very different fields, such as forecasting, collective animal behavior, individual psychology, and machine learning. Two mathematical theorems, Condorcet’s theorem and Jensen’s inequality, provide a general theoretical justification for the averaging procedure. Yet the necessary conditions which guarantee the applicability of these theorems are often not met in practice. Under such circumstances, averaging can lead to suboptimal and sometimes very poor performance. Practitioners in many different fields have independently developed procedures to counteract the failures of averaging. We review such knowledge aggregation procedures and interpret the methods in the light of a statistical decision theory framework to explain when their application is justified. Our analysis indicates that in the ideal case, there should be a matching between the aggregation procedure and the nature of the knowledge distribution, correlations, and associated error costs.
Understanding how data can be corrupted is half the battle of figuring out how to correct the problem. This is one of the complications related to artificial intelligence and machine learning. One example is trying to build sentiment analysis engines. These require huge data terabytes and the Internet provides an endless supply, but the usual result is that the sentiment analysis engines end up racist, misogynist, and all around trolls. It might lead to giggles but does not very accurate results.
Whitney Grace, January 17, 2018