Palantir Technology: Making Some Waves

March 16, 2017

I don’t know about you, but I am not keen on waking up one morning and finding protestors with signs in front of my house. Bummer. One of the motive forces behind Palantir had the pleasure of this experience on March 11, 2017. You can see the invitation to the protest against Palantir in general and Peter Thiel in particular at this link. Note that it helpfully provides Mr. Thiel’s private residence address. Nifty.

I also found interesting the article “Palantir’s Man In The Pentagon.” Buzzfeed seems to have a keen interest in Palantir. I follow Palantir’s technology too. Buzzfeed does seem to come up some enthusiastic writing.

I assume, of course, that everything I read on the Internet is accurate. Therefore, I learned:

A former Palantir “evangelist” has taken a top job at the Defense Department, after spending years lobbying the Pentagon on behalf of the Silicon Valley company.

As a former a laborer in the vineyards of Booz, Allen Hamilton, I know that this is not a shocker. People routinely move from outfit to outfit as they try to create the perfect work history, make money, and do some interesting, even entertaining, work.

The write up told me:

Mikolay, 37, worked for Palantir for four years as an “evangelist,” according to his LinkedIn profile, meaning he met with government officials to sell Palantir’s software. According to a confidential email obtained by BuzzFeed News, Mikolay’s role at Palantir involved pitching the Army on the battlefield intelligence contract, which has become something of a white whale for the Silicon Valley firm.

I also noted:

A Defense Department spokesperson, Capt. Jeff Davis, told BuzzFeed News in a statement: “Mr. Mikolay took action to ensure he would not participate in any matters that would have a direct and predictable effect on Palantir, consistent with conflict of interest statutes and government ethics regulations. Further, he worked with the DoD Standards of Conduct Office to implement a screening arrangement to ensure all particular matters involving Palantir are forwarded to another senior defense official for appropriate disposition. Such recusals are not uncommon for civilian appointees who have worked previously in the private sector.”

Frankly I was more interested in this statement:

Mikolay, in joining the Defense Department, is returning to an agency where he once worked as a speechwriter for former Defense Secretary Leon Panetta. He is a Navy veteran who attended the United States Naval Academy and got a master’s degree at Princeton’s Woodrow Wilson School of Public and International Affairs.

Yep, shocker. A job change in DC with a new administration if office. Hardly surprising because it is standard operating procedure along the banks of the Potomac.

Stephen E Arnold, March 16, 2017

Attivio Takes on SCOLA Repository

March 16, 2017

We noticed that Attivio is back to enterprise search, and now uses the fetching catchphrase, “data dexterity company.” Their News page announces, “Attivio Chosen as Enterprise Search Platform for World’s Largest Repository of Foreign Language Media.” We’ve been keeping an eye on Attivio as it grows. With this press release, Attivio touts a large, recent feather in their cap—providing enterprise search services to SCOLA, a non-profit dedicated to helping different peoples around the world learn about each other. This tool enables SCOLA’s subscribers to find any content in any language, we’re told. The organization regards today’s information technology as crucial to their efforts. The write-up explains: 

SCOLA provides a wide range of online language learning services, including international TV programming, videos, radio, and newspapers in over 200 native languages, via a secure browser-based application. At 85 terabytes, it houses the largest repository of foreign language media in the world. With its users asking for an easier way to find and categorize this information, SCOLA chose Attivio Enterprise Search to act as the primary access point for information through the web portal. This enables users, including teachers and consumers, to enter a single keyword and find information across all formats, languages and geographical regions in a matter of seconds. After looking at several options, SCOLA chose Attivio Enterprise Search because of its multi-language support and ease of customization. ‘When you have 84,000 videos in 200 languages, trying to find the right content for a themed lesson is overwhelming,’ said Maggie Artus, project manager at SCOLA. ‘With the Attivio search function, the user only sees instant results. The behind-the-scenes processing complexity is completely hidden.’”

Attivia was founded in 2007, and is headquartered in Newton, Massachusetts. The company’s client roster includes prominent organizations like UBS, Cisco, Citi, and DARPA. They are also hiring for several positions as of this writing.

Cynthia Murrell, March 16, 2017

IBM Out-Watsons Watson PR

March 15, 2017

I noted that IBM can store data in an atom. I marveled at IBM’s helping with arthritis research. I withdrew my life savings to bet on IBM Watson’s predictions for the next big thing. Wow. Busy that Watson smart software is. Versatile too.

What I found interesting is that IBM has announced that it has knocked the cover off the ball with its speech recognition capabilities. Too bad Amazon, Google, Microsoft, and Nuance think they know how to perform this Star Trek-type function trick. Clueless pretenders if the IBM assertion is accurate.

Navigate to another IBM “real” journalistic revelation in “Why IBM’s Speech Recognition Breakthrough Maters for AI and IoT.”

I learned:

IBM recently announced that its speech recognition system achieved an industry record of 5.5% word error rate, coming closer to human parity.

Yep, an announcement. Remember. Google’s speech recognition is on lots of mobile phones. Dear old Microsoft, despite the missteps of Tay, landed a deal with the dazed and confused UK National Health Service. And Amazon. Well, there is that Alexa Echo and Dot product line. And IBM? Well, an announcement.

The write up reveals that a blog post makes clear that IBM is improving its speech recognition. As proof, the write up points out that IBM’s error rate declined. IBM does that with its revenues, so maybe this is a characteristic of the Big Blue machine.

But I particularly enjoyed this bit of analysis:

Reaching human-level performance in AI tasks such as speech or object recognition remains a scientific challenge, according to Yoshua Bengio, leader of the University of Montreal’s Montreal Institute for Learning Algorithms (MILA) Lab, as quoted in the blog post. Standard benchmarks do not always reveal the variations and complexities of real data, he added. “For example, different data sets can be more or less sensitive to different aspects of the task, and the results depend crucially on how human performance is evaluated, for example using skilled professional transcribers in the case of speech recognition,” Bengio said.

Isn’t this the outfit which Microsoft relies upon for some of its speech wizardry. So what exactly is IBM doing? Let’s ask Alexa?

Stephen E Arnold, March 15, 2017

Yandex Incorporates Semantic Search

March 15, 2017

Apparently ahead of a rumored IPO launch, Russian search firm Yandex is introducing “Spectrum,” a semantic search feature. We learn of the development from “Russian Search Engine Yandex Gets a Semantic Injection” at the Association of Internet Research Specialists’ Articles Share pages. Writer Wushe Zhiyang observes that, though Yandex claims Spectrum can read users’ minds,  the tech appears to be a mix of semantic technology and machine learning. He specifies:

The system analyses users’ searches and identifies objects like personal names, films or cars. Each object is then classified into one or more categories, e.g. ‘film’, ‘car’, ‘medicine’. For each category there is a range of search intents. [For example] the ‘product’ category will have search intents such as buy something or read customer reviews. So we have a degree of natural language processing, taxonomy, all tied into ‘intent’, which sounds like a very good recipe for highly efficient advertising.

But what if a search query has many potential meanings? Yandex says that Spectrum is able to choose the category and the range of potential user intents for each query to match a user’s expectations as close as possible. It does this by looking at historic search patterns. If the majority of users searching for ‘gone with the wind’ expect to find a film, the majority of search results will be about the film, not the book.

As users’ interests and intents tend to change, the system performs query analysis several times a week’, says Yandex. This amounts to Spectrum analysing about five billion search queries.”

Yandex has been busy. The site recently partnered with VKontakte, Russia’s largest social network, and plans to surface public-facing parts of VKontakte user profiles, in real time, in Yandex searches. If the rumors of a plan to go public are true, will these added features help make Yandex’s IPO a success?

Cynthia Murrell, March 15, 2017

Now Online: HonkinNews for 14 March 2017

March 14, 2017

The HonkinNews for March 14, 2017, tackles the ever juicy subject of selling ontology consulting. Bet you cannot wait. We reveal the real reason why poobahs are pitching custom classification systems and hand-crafted controlled term lists. We also nibble at the notion of “relaxed queries.” Our example is Yandex, but other Web search systems use the method to justify displaying more ads with less potential relevance. Microsoft has killed itse social media service none of the goslings in Harrod’s Creek have used. Google is  chasing the social media train again. This time with the Kaggle acquisition. If at first you don’t succeed, buy, buy again. We also take a moment to comment about Google’s smart software which is trying to filter hate speech. Believe it or not, our fearless leader connects the system with Google’s jumping robots and a classroom filled with young children. You can find the video at this link.Ken Toth, March 14, 2017

Is Google Plucking a Chicken Joint?

March 14, 2017

Real chicken or fake news? You decide. I read “Google, What the H&%)? Search Giant Wrongly Said Shop Closed Down, Refused to List the Truth.” The write up reports that a chicken restaurant is clucking mad about how Google references the eatery. The Google, according to the article, thinks the fowl peddler is out of business. The purveyor of poultry disagrees.

The write up reports:

Kaie Wellman says that her rotisserie chicken outlet Arrosto, in Portland, Oregon, US, was showing up as “permanently closed” on Google’s mobile search results.

Ms Wellman contacted the Google and allegedly learned that Google would not change the listing. The fix seems to be that the bird roaster has to get humans to input data via Google Maps. The smart Google system will recognize the inputs and make the fix.

The write up reports that the Google listing is now correct. The fowl mix up is now resolved.

Yes, the Google. Relevance, precision, recall, and accuracy. Well, maybe not so much of these ingredients when one is making fried mobile outputs.

Stephen E Arnold, March 14, 2017

The Human Effort Behind AI Successes

March 14, 2017

An article at Recode, “Watson Claims to Predict Cancer, but Who Trained It To Think,” reminds us that even the most successful AI software was trained by humans, using data collected and input by humans. We have developed high hopes for AI, expecting it to help us cure disease, make our roads safer, and put criminals behind bars, among other worthy endeavors. However, we must not overlook the datasets upon which these systems are built, and the human labor used to create them. Writer (and CEO of DaaS firm Captricity) Kuang Chen points out:

The emergence of large and highly accurate datasets have allowed deep learning to ‘train’ algorithms to recognize patterns in digital representations of sounds, images and other data that have led to remarkable breakthroughs, ones that outperform previous approaches in almost every application area. For example, self-driving cars rely on massive amounts of data collected over several years from efforts like Google’s people-powered street canvassing, which provides the ability to ‘see’ roads (and was started to power services like Google Maps). The photos we upload and collectively tag as Facebook users have led to algorithms that can ‘see’ faces. And even Google’s 411 audio directory service from a decade ago was suspected to be an effort to crowdsource data to train a computer to ‘hear’ about businesses and their locations.

Watson’s promise to help detect cancer also depends on data: decades of doctor notes containing cancer patient outcomes. However, Watson cannot read handwriting. In order to access the data trapped in the historical doctor reports, researchers must have had to employ an army of people to painstakingly type and re-type (for accuracy) the data into computers in order to train Watson.

Chen notes that more and more workers in regulated industries, like healthcare, are mining for gold in their paper archives—manually inputting the valuable data hidden among the dusty pages. That is a lot of data entry. The article closes with a call for us all to remember this caveat: when considering each new and exciting potential application of AI, ask where the training data is coming from.

Cynthia Murrell, March 14, 2017

Significant Others: Salesforce Einstein and IBM Watson

March 13, 2017

The flow of semi-smart software publicity continues. Keep in mind that most smart software is little more than search with wrappers performing special operations.

The proud parents of Einstein and Watson announced that one another’s smart software systems have become a thing. Salesforce has scripts and numerical recipes to make it easier to figure out if a particular client really wants to drop the service. Watson brings Jeopardy type question answering and lots of data training to the festive announcement party.

I enjoyed “Salesforce Will Be Using IBM Watson to Make its Einstein AI Service Even Smarter.” The write up strikes me as somewhat closer to the realities of the tie up than the inebriated best wishes emanating from many other “real” journalists. For example, the write up asserts:

By bringing its all-important Watson service to Salesforce and Einstein customers, IBM is determined to double-down on that huge Salesforce consulting market, not compete with it.

IBM cannot “become” Salesforce. But Salesforce generates a need for services in many large companies. The idea is that Einstein does its thing to help a sales professional close a deal, and IBM Watson can do its thing to make “sense” of the content related to the company paying Salesforce for an integrated sales prospecting and closing system.

My take is that this is not much more than a co-publicity set up with the hope that the ability of Salesforce to talk about its tie up with IBM will generate sales and buzz. IBM hopes that its PR capabilities will produce some mileage for the huffing and puffing Watson “solution.”

In my opinion, IBM is turning cartwheels to get substantial, evergreen revenue from the Watson thing. But IBM may be pushing another fantasy animal into the revenue race. Quantum computing as a service is the next big thing. Now is quantum computing something one can actually use?

Nah, but the point is that revenue is not news at IBM. Quantum computing gives the IBM marketers another drum to bang. Moving in with Salesforce provides a way to sell something, anything, maybe.

Stephen E Arnold, March 13, 2017

Big Data Requires More Than STEM Skills

March 13, 2017

It will require training Canada’s youth in design and the arts, as well as STEM subjects if that country is to excel in today’s big-data world. That is the advice of trio of academic researchers in that country, Patricio Davila, Sara Diamond, and Steve Szigeti,  who declare, “There’s No Big Data Without Intelligent Interface” at the Globe and Mail. The article begins by describing why data management is now a crucial part of success throughout society, then emphasizes that we need creative types to design intuitive user interfaces and effective analytics representations. The researchers explain:

Here’s the challenge: For humans, data are meaningless without curation, interpretation and representation. All the examples described above require elegant, meaningful and navigable sensory interfaces. Adjacent to the visual are emerging creative, applied and inclusive design practices in data “representation,” whether it’s data sculpture (such as 3-D printing, moulding and representation in all physical media of data), tangible computing (wearables or systems that manage data through tactile interfaces) or data sonification (yes, data can make beautiful music).

Infographics is the practice of displaying data, while data visualization or visual analytics refers to tools or systems that are interactive and allow users to upload their own data sets. In a world increasingly driven by data analysis, designers, digital media artists, and animators provide essential tools for users. These interpretive skills stand side by side with general literacy, numeracy, statistical analytics, computational skills and cognitive science.

We also learn about several specific projects undertaken by faculty members at OCAD University, where our three authors are involved in the school’s Visual Analytics Lab. For example, the iCity project addresses transportation network planning in cities, and the Care and Condition Monitor is a mobile app designed to help patients and their healthcare providers better work together in pursuit of treatment goals. The researchers conclude with an appeal to their nation’s colleges and universities to develop programs that incorporate data management, data numeracy, data analysis, and representational skills early and often. Good suggestion.

Cynthia Murrell, March 13, 2017

To Make Data Analytics Sort of Work: Attention to Detail

March 10, 2017

I read “The Much-Needed Business Facet for Modern Data Integration.” The write up presents some useful information. Not many of the “go fast and break things” crowd will relate to some of the ideas and suggestions, but I found the article refreshing.

What does one do to make modern data centric activities sort of work? The answers are ones that I have found many more youthful wizards often elect to ignore.

Here they are:

  1. Do data preparation. Yikes. Normalization of data. I have fielded this question in the past, “Who has time for that?” Answer: Too few, gentle reader. Too few.
  2. Profile the data. Another gasp. In my experience it is helpful to determine what data are actually germane to the goal. Think about the polls for the recent
  3. Create data libraries. Good idea. But it is much more fun to just recreate data sets. Very Zen like.
  4. Have rules which are now explained as “data governance.” The jargon does not change the need for editorial and data guidelines.
  5. Take a stab at data quality. This is another way of saying, “Clean up the data.” Even whiz bang modern systems are confused with differences like I.B.M and International Business Machines or numbers with decimal points in the incorrect place.
  6. Get colleagues in the game. This is a good idea, but in many organizations in which I have worked “team” is spelled “my bonus.”

Useful checklist. I fear that those who color unicorns will not like the dog work which accompanies implementing the ideas. That’s what makes search and content processing so darned interesting.

Stephen E Arnold, March 10, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta