Big Data and Value
May 19, 2016
I read “The Real Lesson for Data Science That is Demonstrated by Palantir’s Struggles · Simply Statistics.” I love write ups that plunk the word statistics near simple.
Here’s the passage I highlighted in money green:
… What is the value of data analysis?, and secondarily, how do you communicate that value?
I want to step away from the Palantir Technologies’ example and consider a broader spectrum of outfits tossing around the jargon “big data,” “analytics,” and synonyms for smart software. One doesn’t communicate value. One finds a person who needs a solution and crafts the message to close the deal.
When a company and its perceived technology catches the attention of allegedly informed buyers, a bandwagon effort kicks in. Talks inside an organization leads to mentions in internal meetings. The vendor whose products and services are the subject of these comments begins to hint at bigger and better things at conferences. Then a real journalist may catch a scent of “something happening” and writes an article. Technical talks at niche conferences generate wonky articles usually without dates or footnotes which make sense to someone without access to commercial databases. If a social media breeze whips up the smoldering interest, then a fire breaks out.
A start up should be so clever, lucky, or tactically gifted to pull off this type of wildfire. But when it happens, big money chases the outfit. Once money flows, the company and its products and services become real.
The problem with companies processing a range of data is that there are some friction inducing processes that are tough to coat with Teflon. These include:
- Taking different types of data, normalizing it, indexing it in a meaningful manner, and creating metadata which is accurate and timely
- Converting numerical recipes, many with built in threshold settings and chains of calculations, into marching band order able to produce recognizable outputs.
- Figuring out how to provide an infrastructure that can sort of keep pace with the flows of new data and the updates/corrections to the already processed data.
- Generating outputs that people in a hurry or in a hot zone can use to positive effect; for example, in a war zone, not get killed when the visualization is not spot on.
The write up focuses on a single company and its alleged problems. That’s okay, but it understates the problem. Most content processing companies run out of revenue steam. The reason is that the licensees or customers want the systems to work better, faster, and more cheaply than predecessor or incumbent systems.
The vast majority of search and content processing systems are flawed, expensive to set up and maintain, and really difficult to use in a way that produces high reliability outputs over time. I would suggest that the problem bedevils a number of companies.
Some of those struggling with these issues are big names. Others are much smaller firms. What’s interesting to me is that the trajectory content processing companies follow is a well worn path. One can read about Autonomy, Convera, Endeca, Fast Search & Transfer, Verity, and dozens of other outfits and discern what’s going to happen. Here’s a summary for those who don’t want to work through the case studies on my Xenky intel site:
Stage 1: Early struggles and wild and crazy efforts to get big name clients
Stage 2: Making promises that are difficult to implement but which are essential to capture customers looking actively for a silver bullet
Stage 3: Frantic building and deployment accompanied with heroic exertions to keep the customers happy
Stage 4: Closing as many deals as possible either for additional financing or for licensing/consulting deals
Stage 5: The early customers start grousing and the momentum slows
Stage 6: Sell off the company or shut down like Delphes, Entopia, Siderean Software and dozens of others.
The problem is not technology, math, or Big Data. The force which undermines these types of outfits is the difficulty of making sense out of words and numbers. In my experience, the task is a very difficult one for humans and for software. Humans want to golf, cruise Facebook, emulate Amazon Echo, or like water find the path of least resistance.
Making sense out of information when someone is lobbing mortars at one is a problem which technology can only solve in a haphazard manner. Hope springs eternal and managers are known to buy or license a solution in the hopes that my view of the content processing world is dead wrong.
So far I am on the beam. Content processing requires time, humans, and a range of flawed tools which must be used by a person with old fashioned human thought processes and procedures.
Value is in the eye of the beholder, not in zeros and ones.
Stephen E Arnold, May 19, 2016
Signs of Life from Funnelback
May 19, 2016
Funnelback has been silent as of late, according to our research, but the search company has emerged from the tomb with eyes wide open and a heartbeat. The Funnelback blog has shared some new updates with us. The first bit of news is if you are “Searchless In Seattle? (AKA We’ve Just Opened A New Office!)” explains that Funnelback opened a new office in Seattle, Washington. The search company already has offices in Poland, United Kingdom, and New Zealand, but now they want to establish a branch in the United States. Given their successful track record with the finance, higher education, and government sectors in the other countries they stand a chance to offer more competition in the US. Seattle also has a reputable technology center and Funnelback will not have to deal with the Silicon Valley group.
The second piece of Funnelback news deals with “Driving Channel Shift With Site Search.” Channel shift is the process of creating the most efficient and cost effective way to deliver information access and usage to users. It can be difficult to implement a channel shift, but increasing the effectiveness of a Web site’s search can have a huge impact.
Being able to quickly and effectively locate information on a Web site saves time for not only more important facts, but it also can drive sales, further reputation, etc.
“You can go further still, using your search solution to provide targeted experiences; outputting results on maps, searching by postcode, allowing for short-listing and comparison baskets and even dynamically serving content related to what you know of a visitor, up-weighting content that is most relevant to them based on their browsing history or registered account.
Couple any of the features above with some intelligent search analytics, that highlight the content your users are finding and importantly what they aren’t finding (allowing you to make the relevant connections through promoted results, metadata tweaking or synonyms), and your online experience is starting to become a lot more appealing to users than that queue on hold at your call centre.”
I have written about it many times, but a decent Web site search function can make or break a site. Not only does it demonstrate that the Web site is not professional, it does not inspire confidence in a business. It is a very big rookie mistake to make.
Whitney Grace, May 19, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Travel to South Africa Virtually with Googles Mzansi Experience
May 18, 2016
The article on Elle titled Google SA Launches the Mzansi Experience On Maps illustrates the new Google Street View collection for South Africa. For people without the ability to travel, or scared of malaria or Oscar Pistorius, this collection offers an in-depth platform to view some of South Africa’s natural wonders and parks. The article explains,
“Using images collected by the Street View Tripod and Trekker, Google has created 360-degree imagery of some of South Africa’s most beautiful locations, and created virtual tours that enable visitors to see the sights for themselves on their phones, tablets or computers. Visitors will be able to, for the first time, visit a family of elephants in the Kruger National Park, take a virtual walk on Table Mountain, admire Cape Point, or take a walk along Durban’s Golden Mile.”
For South Africa, this initiative might spark increased tourism once people realize just how much the country has to offer. So many of the images of Africa that we are exposed to in the US are reductive and patronizing, like those ceaseless commercials depicting all of Africa as a small, poverty-stricken village. Google’s new collection helps to promote a more diverse and appealing look at one African country: South Africa. Whether you want to go in person or virtually, this is worth checking out!
Chelsea Kerwin, May 18, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
The Trials, Tribulations, and Party Anecdotes Of “Edge Case” Names
May 16, 2016
The article titled These Unlucky People Have Names That Break Computers on BBC Future delves into the strange world of “edge cases” or people with unexpected or problematic names that reveal glitches in the most commonplace systems that those of us named “Smith” or “Jones” take for granted. Consider Jennifer Null, the Virginia woman who can’t book a plane ticket or complete her taxes without extensive phone calls and headaches. The article says,
“But to any programmer, it’s painfully easy to see why “Null” could cause problems for a database. This is because the word “null” is often inserted into database fields to indicate that there is no data there. Now and again, system administrators have to try and fix the problem for people who are actually named “Null” – but the issue is rare and sometimes surprisingly difficult to solve.”
It may be tricky to find people with names like Null. Because of the nature of the controls related to names, issues generally arise for people like Null on systems where it actually does matter, like government forms. This is not an issue unique to the US, either. One Patrick McKenzie, an American programmer living in Japan, has run into regular difficulties because of the length of his last name. But that is nothing compared to Janice Keihanaikukauakahihulihe’ekahaunaele, a Hawaiian woman who championed for more flexibility in name length restrictions for state ID cards.
Chelsea Kerwin, May 16, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Anonymous Hacks Turkish Cops
May 16, 2016
Anonymous has struck again, this time hacking the Turkish General Directorate of Security (EGM) in its crusade against corruption. The International Business Times reports, “Anonymous: Hacker Unleashes 17.8 GB Trove of Data from a Turkish National Police Server.” It is believed that the hacker responsible is ROR[RG], who was also deemed responsible for last year’s Adult Friend Finder breach. The MySQL-friendly files are now available for download at TheCthulhu website, which seems to be making a habit of posting hacked police data.
Why has Anonymous targeted Turkey? Reporter Jason Murdock writes:
“Anonymous has an established history with carrying out cyberattacks against Turkey. In 2015 the group, which is made up of a loose collection of hackers and hacktivists from across the globe, officially ‘declared war’ on the country. In a video statement, the collective accused Turkish President Recep Tayyip Erdo?an’s government of supporting the Islamic State (Isis), also known as Daesh.
“’Turkey is supporting Daesh by buying oil from them, and hospitalising their fighters,’ said a masked spokesperson at the time. ‘We won’t accept that Erdogan, the leader of Turkey, will help Isis any longer. If you don’t stop supporting Isis, we will continue attacking your internet […] stop this insanity now Turkey. Your fate is in your own hands.’”
We wonder how Turkey will respond to this breach, and what nuggets of troublesome information will be revealed. We are also curious to see what Anonymous does next; stay tuned.
Cynthia Murrell, May 16, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Parts Unknown of Dark Web Revealed in Study
May 13, 2016
While the parts unknown of the internet is said to be populated by terrorists’ outreach and propaganda, research shows a different picture. Quartz reports on this in the article, The dark web is too slow and annoying for terrorists to even bother with, experts say. The research mentioned comes from Thomas Rid and Daniel Moore of the Department of War Studies at King’s College London. They found 140 extremist Tor hidden services; inaccessible or inactive services topped the list with 2,482 followed by 1,021 non-illicit services. As far as illicit services, those related to drugs far outnumbered extremism with 423. The write-up offers a few explanations for the lack of terrorists publishing on the Dark Web,
“So why aren’t jihadis taking advantage of running dark web sites? Rid and Moore don’t know for sure, but they guess that it’s for the same reason so few other people publish information on the dark web: It’s just too fiddly. “Hidden services are sometimes slow, and not as stable as you might hope. So ease of use is not as great as it could be. There are better alternatives,” Rid told Quartz. As a communications platform, a site on the dark web doesn’t do what jihadis need it to do very well. It won’t reach many new people compared to “curious Googling,” as the authors point out, limiting its utility as a propaganda tool. It’s not very good for internal communications either, because it’s slow and requires installing additional software to work on a mobile phone.”
This article provides fascinating research and interesting conclusions. However, we must add unreliable and insecure to the descriptors for why the Dark Web may not be suitable for such uses.
Megan Feil, May 13, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Smart Software a Derby Winner. Watson Does Not Show
May 12, 2016
I read “AI Predicts All Four Top Places in the Kentucky Derby: Machine Uses Swarm Intelligence to Turn $20 bet into $11,000.” Let’s assume that the magic revealed in the write up is spot on. I will not ask the question, “How much would IBM have won if it had bet a couple of hundred million on the Kentucky Derby using the revealed technology?” Believe me, I want to ask that question, but I will exercise restraint.
According to the write up:
An artificial intelligence program developed by Unanimous A.I. successfully predicted the Superfecta at the 142nd Kentucky Derby last Saturday, turning a $20 bet into nearly $11,000. Using ‘Swarm Intelligence,’ the AI was able to correctly choose the winning horse, Nyquist – along with the second, third, and fourth finishers.
The article includes a nifty, real anigif to illustrate how “swarm intelligence” made big money at the track.
The idea originated at the real news outfit TechRepublic.
The trick:
Many minds are better than one.
Should one ask Watson if it can perform the same big payday magic? Nah.
Stephen E Arnold, May 12, 2016
Amusing Mistake Illustrates Machine Translation Limits
May 12, 2016
Machine translation is not quite perfect yet, but we’ve been assured that it will be someday. That’s the upshot of Business Insider’s piece, “This Microsoft Exec’s Hilarious Presentation Fail Shows Why Computer Translation is so Difficult.” Writer Matt Weinberger relates an anecdote shared by Microsoft research head Peter Lee. The misstep occurred during a 2015 presentation, for which Lee set up Skype Translator to translate his words over the speakers into Mandarin as he went. Weinberger writes:
“Part of Lee’s speech involved a personal story of growing up in a ‘snowy town’ in upper Michigan. He noticed that most of the crowd was enraptured — except for a few native Chinese speakers in the crowd who couldn’t stop giggling. After the presentation, Lee says he asked one of those Chinese speakers the reason for the laughter. It turns out that ‘snowy town’ translates into ‘Snow White’s Town.’ Which seems innocent enough, except that it turns out that ‘Snow White’s town’ is actually Chinese slang for ‘a town where a prostitute lives,’ Lee says. Whoops.
“Lee says it wasn’t caught in the profanity filters because there weren’t actually any bad words in the phrase. But it’s the kind of regional flavor where a direct translation of the words can’t bring across the meaning.”
Whoops indeed. The article notes that another problem with Skype Translator is its penchant for completely disregarding non-word utterances, like “um” and “ahh,” that often carry necessary meaning. We’re reminded, though, that these and other problems are expected to be ironed out within the next few years, according to Microsoft Research chief scientist Xuedong Huang. I wonder how many more amusing anecdotes will arise in the meantime.
Cynthia Murrell, May 12, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Billions in vc Funding Continues Rinse and Repeat Process
May 12, 2016
In the tech world, the word billion may be losing meaning for some. Pando published a recent editorial called, While the rest of tech struggles, so far VCs have raised more this quarter than in past three years. This piece calls attention to the seemingly never-ending list of VC firms raising ever-more funds. For example, Accel announced their funds were at $2 billion, Founders Fund raised $1 billion in new funds, and Andreessen Horowitz currently works to achieve another $1.5 billion. The author writes,
“It was hard to put that [recent fundraising rounds] in context. I mean, yeah. These are major funds. Is it news that they raised a collective $4.5 billion more at some point? Doesn’t mean they’ll invest it any more quickly. All it means is that the two will still be around for another ten years, which we kinda already guessed. It’s staggeringly hard for a venture fund to actually go out of business, even when it wasn’t some of the first money in Facebook or, in the case of Marc Andreessen, sits on its board. [Disclosure: Marc Andreessen, Founders Fund and Accel are all investors in Pando.]”
As the author wonders, asking Pitchbook if it’s a “bigger quarter than usual”, our eyebrows are not raised by this this thought, nor easy money, bubbles, unicorns. Nah, this is just routine in Sillycon Valley.
Megan Feil, May 12, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
DARPA Seeks Keys to Peace with High-Tech Social Science Research
May 11, 2016
Strife has plagued the human race since the beginning, but the Pentagon’s research arm thinks may be able to get to the root of the problem. Defense Systems informs us, “DARPA Looks to Tap Social Media, Big Data to Probe the Causes of Social Unrest.” Writer George Leopold explains:
“The Defense Advanced Research Projects Agency (DARPA) announced this week it is launching a social science research effort designed to probe what unifies individuals and what causes communities to break down into ‘a chaotic mix of disconnected individuals.’ The Next Generation Social Science (NGS2) program will seek to harness steadily advancing digital connections and emerging social and data science tools to identify ‘the primary drivers of social cooperation, instability and resilience.’
“Adam Russell, DARPA’s NGS2 program manager, said the effort also would address current research limitations such as the technical and logistical hurdles faced when studying large populations and ever-larger datasets. The project seeks to build on the ability to link thousands of diverse volunteers online in order to tackle social science problems with implications for U.S. national and economic security.”
The initiative aims to blend social science research with the hard sciences, including computer and data science. Virtual reality, Web-based gaming, and other large platforms will come into play. Researchers hope their findings will make it easier to study large and diverse populations. Funds from NGS2 will be used for the project, with emphases on predictive modeling, experimental structures, and boosting interpretation and reproducibility of results.
Will it be the Pentagon that finally finds the secret to world peace?
Cynthia Murrell, May 11, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph