Bad Big Data? Get More Data Then

March 2, 2017

I like the idea that more is better. The idea is particularly magnetic when a company cannot figure out what it’s own, in house, proprietary data mean. Think of the legions of consultants from McKinsey and BCG telling executives what their own data “means.” Toss in the notion of a Big Data in a giant “data lake,” and you have decision makers who cannot use the information they already have.

Well, how does one fix that problem? Easy. Get more data. That sounds like a plan, particularly when the professionals struggling are in charge of figuring out if sales and marketing investments sort of pay for themselves.

I learned that I need more data by reading “Deepening The Data Lake: How Second-Party Data Increases AI For Enterprises.” The headline introduces the amazing data lake concept along with two giant lake front developments: More data and artificial intelligence.

Buzzwords? Heck no. Just solid post millennial reasoning; for example:

there are many marketers with surprisingly sparse data, like the food marketer who does not get many website visitors or authenticated customers downloading coupons. Today, those marketers face a situation where they want to use data science to do user scoring and modeling but, because they only have enough of their own data to fill a shallow lake, they have trouble justifying the costs of scaling the approach in a way that moves the sales needle.

I like that sales needle phrase. Marketers have to justify themselves and many have only “sparse” data. I would suggest that marketers have often useless data like the number of unique clicks, but that’s only polluting the data lake.

The fix is interesting. I learned:

we can think of the marketer’s first-party data – media exposure data, email marketing data, website analytics data, etc. – being the water that fills a data lake. That data is pumped into a data management platform (pictured here as a hydroelectric dam), pumped like electricity through ad tech pipes (demand-side platforms, supply-side platforms and ad servers) and finally delivered to places where it is activated (in the town, where people live)… this infrastructure can exist with even a tiny bit of water but, at the end of the cycle, not enough electricity will be generated to create decent outcomes and sustain a data-driven approach to marketing. This is a long way of saying that the data itself, both in quality and quantity, is needed in ever-larger amounts to create the potential for better targeting and analytics.

Yep, more data.

And what about making sense of the additional data? I learned:

The data is also of extremely high provenance, and I would also be able to use that data in my own environment, where I could model it against my first-party data, such as site visitors or mobile IDs I gathered when I sponsored free Wi-Fi at the last Country Music Awards. The ability to gather and license those specific data sets and use them for modeling in a data lake is going to create massive outcomes in my addressable campaigns and give me an edge I cannot get using traditional ad network approaches with third-party segments. Moreover, the flexibility around data capture enables marketers to use highly disparate data sets, combine and normalize them with metadata – and not have to worry about mapping them to a predefined schema. The associative work happens after the query takes place. That means I don’t need a predefined schema in place for that data to become valuable – a way of saying that the inherent observational bias in traditional approaches (“country music fans love mainstream beer, so I’d better capture that”) never hinders the ability to activate against unforeseen insights.

Okay, I think I understand. No wonder companies hire outfits like blue chip consulting firms to figure out what is going on in their companies. Stated another way, insiders live in the swamp. Outsiders can put the swamp into a context and maybe implement some pollution control systems.

Stephen E Arnold, March 2, 2017

Parlez Vous Qwant, Nest-Ce Pas?

March 2, 2017

One of Google’s biggest rivals is Yandex, at least in Russia.  Yandex is a Russian owned and operated search engine and is more popular in Russia than the Google, depending on the statistics.  It goes to say that a search engine built and designed by native speakers does have a significant advantage over foreign competition, and it looks like France wants a chance to beat Google.  Search Engine Journal reports that, “Qwant, A French Search Engine, Thinks It Can Take On Google-Here’s Why.”

Qwant was only founded in 2013 and it has grown to serve twenty-one million monthly users in thirty countries.  The French search engine has seen a 70% growth each year and it will see more with its recent integration with Firefox and a soon-to-be launched mobile app.  Qwant is very similar to DuckDuckGo in that it does not collect user data.  It also boasts mote search categories than news, images, and video and these include, music, social media, cars, health, music, and others.  Qwant had an interesting philosophy:

The company also has a unique philosophy that artificial intelligence and digital assistants can be educated without having to collect data on users. That’s a completely different philosophy than what is shared by Google, which collects every bit of information it can about users to fuel things like Google Home and Google Allo.

Qwant still wants to make a profit with pay-per-click and future partnerships with eBay and TripAdvisor, but they will do without compromising a user’s privacy.  Qwant has a unique approach to search and building AI assistants, but it has a long way to go before it reaches Google heights.

They need to engage more users not only on laptops and computers but also mobile devices.  They also need to form more partnerships with other browsers.

Bon chance, Qwant!  But could you share how you plan to make AI assistants without user data?

Whitney Grace, March 2, 2017

 

Inside Loon Balloons

March 2, 2017

You may have heard about Google X’s Project Loon, which aims to bring Internet access to underserved, rural areas using solar-powered balloons. The post, “Here’s How Google Makes its Giant, Internet-Beaming Balloons,” at Business Insider takes us inside that three-year-old project, describing some of how the balloons are made and used. The article is packed with helpful photos and GIFs. We learn that the team has turned to hot-air-balloon manufacturer Raven Aerostar for their expertise. The write-up tells us:

The balloons fly high in the stratosphere at about 60,000 to 90,000 feet above Earth. That’s two to three times as high as most commercial airplanes. Raven Aerostar creates a special outer shell for the balloons, called the film, that can hold a lot of pressure — allowing the balloons to float in the stratosphere for longer. The film is as thin as a typical sandwich bag. … The film is made of a special formulation of polyethylene that allows it to retain strength when facing extreme temperatures of up to -112 degrees Fahrenheit.

We like the comparison sandwich bag. The balloons are tested in sub-freezing conditions at the McKinley Climatic Lab—see the article for dramatic footage of one of their test subjects bursting. We also learn about the “ballonet,” an internal compartment in each balloon that controls altitude and, thereby, direction. Each balloon is equipped with a GPS tracker, of course, and all electronics are secured in a tiny basket below.

One caveat is a bit disappointing—users cannot expect to stream high-quality videos through the balloons. Described as “comparable to 3G,” the service should be enough for one to visit websites and check email. That is certainly far better than nothing and could give rural small-business owners and remote workers the Internet access they need.

Cynthia Murrell, March 2, 2017

Beyond Search Evolution Underway

March 1, 2017

Today we are introducing changes to Beyond Search. We are approaching 10 years of daily publication and in that time enterprise search and content processing has undergone a significant change. Enterprise search is no longer exciting. In fact, a number of companies have pivoted to different services. Search has become for many a utility at best or a ho-hum solution. Web search has degraded to the lowest common denominator of generating revenue via ads. The handful of “objective” Web search systems walk a perilous cliff edge between paying their bills and providing an index to a subset of publicly accessible content. We will continue to cover important items in Beyond Search, but we are shifting our focus to products and services related to voice-centric information access.

The Beyond Alexa blog is in its formative stages. We have started to flow content into this new service. It will include Augmentext-type stories (for information follow the link), special articles, short videos on voice related topics, and inclusions (a fancy word for sponsored content or in my lingo, ads with information value). The idea is that Alexa has become an interesting product niche, but the impact of voice-related information access is now changing rapidly. Frankly it is more dynamic than the decades old keyword search business.

You can view the alpha version of Beyond Alexa at this link. As we ramp up the service, we will have other announcements about the service. We passed the 15,000 article milestone in Beyond Search last year. Since early 2008, we have tracked the keyword centric approach to finding and making sense of information. Our changing focus reflects the fact that I wrote about years ago in Searcher Magazine. Keyword search linked to a keyboard, if not dead, was headed for marginalization.

That’s why we want to explore “beyond” Alexa, Amazon’s odd little voice activated box which does a bang up job of providing the temperature and almost friction free impulse shopping. We think there’s more “beyond” Alexa. We want to explore the new world of ubiquitous and Teflon-slick  information access.

Stephen E Arnold, March 1, 2017

Quote to Note: Yahoo and Sunshine

March 1, 2017

Here’s a quote I highlighted. The source is CNBC.com, a real journalism type outfit. The quote appeared in “How Richard Branson, Warren Buffett, Elon Musk and 13 Other Leaders Start the Day.” Marissa Mayer allegedly said:

Today is hard, tomorrow will be worse and the day after tomorrow will be sunshine.

Yep, with a nice golden parachute, the world may look golden. About that Yahoo security breach. Must be sunshine.

Stephen E Arnold, March 1, 2017

Google: Translation King?

March 1, 2017

I read “Google’s AI Software Wins Top Score among Machines in Translation Battle.” Good news for the GOOG. The company recently limited free online translation, and I noted when I was translating a test passage from Persian to English that the free Google system truncated the passage, a problem which did not plague the FreeTranslatioins.org system. Persian is a bit more of hill climb than translating Spanish to Italian, but the unpredictable behavior was telling.

The write up, however, encountered no glitches it seems. I learned:

Artificial intelligence language software by US Internet giant Google Inc., scored higher than its rival AI machines in a translation battle between humans and machines held in South Korea [in February 2017].

The Google system made kimchi of four human translators, Systran (a go to fave for many years), and the Naver system (anyone remember Naver search?).

The Google system performed well, according to the “real” news outfit Korea Herald:

the organizers said the four professional translators scored better in translating random English articles — literature and non-literature — into Korean and other Korean articles into English than the machines. Of the machines, Google scored a total of 28 out of 60, followed by Naver’s automated translation app called Papago with 17 and Systran with 15, the tech company officials with knowledge of the matter said.

Yikes. Humans did better. No guaranteed annual income for these folks.

Who lost the battle? Systran International.

The factoid I noted was: “The new systems considered an “entire sentence as one unit.”

But humans? Better.

Stephen E Arnold, March 1, 2017

Fake News Is Old News. Fake Research Is Old Too.

March 1, 2017

I read “Crossfire” on the Andrewgelman.com site. I liked the write up. I noted the introduction’s quotation from 1967. I had heard something similar from one of my college instructors in 1962. My recollection is that one of his professors told him about crazy research, fiddled experiments, and lousy math in the late 1930s. My hunch is that this declaration of “a crisis” has been pointed out since folks gathered for lectures about mathiness and rational thought.

I did highlight several passages from the write up:

  • A comment about a flawed study: “The basic problem here is not the results, but the basic implausibility of the methods combined with the results.”
  • On getting published in an academic journal: “Everything will get published, if you just keep submitting it to journal after journal.”
  • On the state of “real” research: “The real problem is that this sort of work is standard operating practice in the field of psychology, no better and no worse (except for the faked data) than the papers on himmicanes, air rage, etc., endorsed by the prestigious National Academy of Sciences. As long as this stuff is taken seriously…”

There’s interest in fake news. A British newspaper staffed with “real” journalists has been banned as a source for Wikipedia. What about “real” scholars who crank out fake research? Oh, right, it takes expertise to identify some academic baloney. Who has time for academic research when watching Facebook videos is the better way to become a critical thinker. Marketing for tenure: Great idea.

Stephen E Arnold, March 1, 2017

Chan and Zuckerberg Invest in Science Research Search Engine, Meta

March 1, 2017

Mark Zuckerberg and his wife Priscilla Chan have dedicated a portion of their fortune to philanthropy issues through their own organization, the Chan Zuckerberg InitiativeTech Crunch shares that one of their first acquisitions is to support scientific research, “Chan Zuckerberg Initiative Acquires And Will Free Up Science Search Engine Meta.”

Meta is a search engine dedicated to science research papers and it is powered by artificial intelligence.  Chan and Zuckerberg plan to make Meta free in a few months, but only after they have enhanced it.  Once released, Meta will help scientists find the latest papers in their study fields, which is awesome as these papers are usually blocked behind paywalls.  What is even better is that Meta will also assist funding organizations with research and areas with potential for investment/impact.  What makes Meta different from other search engines or databases is quite fantastic:

What’s special about Meta is that its AI recognizes authors and citations between papers so it can surface the most important research instead of just what has the best SEO. It also provides free full-text access to 18,000 journals and literature sources.

Meta co-founder and CEO Sam Molyneux writes that “Going forward, our intent is not to profit from Meta’s data and capabilities; instead we aim to ensure they get to those who need them most, across sectors and as quickly as possible, for the benefit of the world.

CZI invested $3 billion dedicated to curing all diseases and they already built the Biohub in San Francisco for medical research.  Meta works like this:

Meta, formerly known as Sciencescape, indexes entire repositories of papers like PubMed and crawls the web, identifying and building profiles for the authors while analyzing who cites or links to what. It’s effectively Google PageRank for science, making it simple to discover relevant papers and prioritize which to read. It even adapts to provide feeds of updates on newly published research related to your previous searches.

Meta is an ideal search engine, because it crawls the entire Web (supposedly) and returns verified information, not to mention potential research partnerships and breakthroughs.  This is the type of database researchers have dreamed of for years.  Would CZI be willing to fund something similar for fields other than science?  Will they run into trouble with other organizations less interested in philanthropy?

Whitney Grace, March 1, 2017

Dark Web Drug Dealers Busted in Finland

March 1, 2017

Law enforcement’s focus on the Dark Web seems to be paying off, as we learn from the write-up, “Finland: Dark Web Drug Operation Exposed” at Hetq, an outlet of the Association of Investigative Journalists. In what was described as Finland’s largest drug bust, authorities seized over a million dollars’ worth of narcotics from a network selling their wares on the Dark Web. We learn:

The network is alleged to have imported €2 million (US$ 2.2 million) worth of drugs between 2014 and 2016, selling them on the dark web site Silkkitie. More than 40 kilograms of powdered narcotics, such as amphetamine, heroin and cocaine, as well as 40,000 ecstasy tablets and 30,000 LSD blotters were smuggled into Finland from the Netherlands and Germany, and then sold on the site. …

As part of the investigation, customs officers in April seized at least €1.1 million worth of heroin, cocaine, methamphetamine, MDMA and ecstasy in the coastal town of Kustavi. The same month, police arrested three Finnish citizens.

The write-up notes that Silkkitie users communicated through encrypted messages under pseudonyms, and that Bitcoin was the currency used. We’re also reminded that Silkkitie, a.k.a. Valhalla, is one of the Dark Web’s most popular drug marketplaces. The Finnish site was launched in 2013.

Cynthia Murrell, March 1, 2017

« Previous Page

  • Archives

  • Recent Posts

  • Meta