How Data Science Pervades

May 2, 2017

We think Information Management may be overstating a bit with the headline, “Data Science Underlies Everything the Enterprise Now Does.”  While perhaps not underpinning quite “everything,” the use of data analysis has indeed spread throughout many companies (especially larger ones).

Writer Michael O’Connell cites a few key developments over the last year alone, including the rise of representative data, a wider adoption of predictive analysis, and the refinement of customer analytics. He predicts, even more, changes in the coming year, then uses a hypothetical telecom company for a series of examples. He concludes:

You’ll note that this model represents a significant broadening beyond traditional big data/analytics functions. Such task alignment and comprehensive integration of analytics functions into specific business operations enable high-value digital applications ranging far beyond our sample Telco’s churn mitigation — cross-selling, predictive and condition-based maintenance, fraud detection, price optimization, and logistics management are just a few areas where data science is making a huge difference to the bottom line.

See the article for more on the process of turning data into action, as illustrated with the tale of that imaginary telecom’s data-wrangling adventure.

Cynthia Murrell, May 2, 2017

The Design Is Old School, but the Info Is Verified

April 5, 2017

For a moment, let us go back to the 1990s.  The Internet was still new, flash animation was “da bomb” (to quote the vernacular of the day), and Web site design was plain HTML.  While you could see prime examples of early Web site design visiting the Internet Archive, but why hit the time machine search button when you can simply visit RefDesk.com.

RefDesk is reminiscent of an old AOL landing page, except it lacks the cheesy graphics and provides higher quality information.  RefDesk is an all-inclusive reference and fact checking Web site that pools links of various sources with quality information into one complete resource.  It keeps things simple with the plain HTML format, then it groups sources together based on content and relevance, such as search engines, news outlets, weather, dictionaries, games, white pages, yellow pages, and specialized topics that change daily.  RefDesk’s mission is to take the guesswork out of the Internet:

The Internet is the world’s largest library containing millions of books, artifacts, images, documents, maps, etc. There is but one small problem in this library: everything is scattered about on the floor, with growing hordes of confused and bewildered users frantically shifting through the maze, occasionally crying out, Great Scott, look at what I just found!’ Enter refdesk.

Refdesk has three goals: (1) fast access, (2) intuitive and easy navigation and (3) comprehensive content, rationally indexed. The prevailing philosophy here is: simplicity. “Simplicity is the natural result of profound thought.” And, very difficult to achieve.

Refdesk is the one stop source to find verified, credible resources because a team dedicated to fishing out the facts from the filth that runs amuck on other sites runs it.  It set up shop in 1995 and the only thing that has changed is the information.  It might be basic, it might be a tad bland, but the content is curated to ensure credibility.

Elementary school kids take note; you can use this on your history report.

Whitney Grace, April 5, 2017

 

Dataminr Presented to Foreign Buyers Through Illegal Means

April 4, 2017

One thing that any company wants is more profit.  Companies generate more profit by selling their products and services to more clients.  Dataminr wanted to add more clients to their roster and a former Hillary Clinton wanted to use his political connections to get more clients for Dataminr of the foreign variety.  The Verge has the scoop on how this happened in, “Leaked Emails Reveal How Dataminr Was Pitched To Foreign Governments.”

Dataminr is a company specializing in analyzing Twitter data and turning it into actionable data sets in real-time.  The Clinton aide’s personal company, Beacon Global Strategies, arranged to meet with at least six embassies and pitch Dataminr’s services.  All of this came to light when classified emails were leaked to the public on DCLeaks.com:

The leaked emails shed light on the largely unregulated world of international lobbying in Washington, where “strategic advisors,” “consultants,” and lawyers use their US government experience to benefit clients and themselves, while avoiding public scrutiny both at home and overseas.

Beacon isn’t registered to lobby in Washington. The firm reportedly works for defense contractors and cybersecurity companies, but it hasn’t made its client list public, citing non-disclosure agreements. Beacon’s relationship with Dataminr has not been previously reported.

The aide sold Dataminr’s services in a way that suggest they could be used for surveillance.  Beacon even described Dataminr as a way to find an individual’s digital footprint.  Twitter’s development agreement forbids third parties from selling user data if it will be used for surveillance.  But Twitter owns a 5% stake in Dataminr and allows them direct access to their data firehose.

It sounds like some back alley dealing took place.  The ultimate goal for the Clinton aide was to make money and possibly funnel that back into his company or get a kickback from Dataminr.  It is illegal for a company to act in this manner, says the US Lobbying Disclosure Act, but there are loopholes to skirt around it.

This is once again more proof that while a tool can be used for good, it can also be used in a harmful manner.  It begs the question, though, that if people leave their personal information all over the Internet, is it not free for the taking?

Whitney Grace, April 4, 2017

Alternative (Aka Fake) News Not Going Anywhere

March 29, 2017

The article titled The Rise of Fake News Amidst the Fall of News Media on Silicon Valley Watcher makes a convincing argument that fake news is the inevitable result of the collective failure to invest in professional media. The author, Tom Foremski, used to write for the Financial Times. He argues that the almost ongoing layoffs among professional media organizations such as the New York Times, Salon, The Guardian, AP, Daily Dot, and IBT illustrate the lack of a sustainable business model for professional news media. The article states,

People won’t pay for the news media they should be reading but special interest groups will gladly pay for the media they want them to read. We have important decisions to make about a large number of issues such as the economy, the environment, energy, education, elder healthcare and those are just the ones that begin with the letter “E” — there’s plenty more issues. With bad information we won’t be able to make good decisions. Software engineers call this GIGO – Garbage In Garbage Out.

This issue affects us all; fake news even got a man elected to the highest office in the land.  With Donald Trump demonstrating on a daily basis that he has no interest in the truth, whether, regarding the size of the crowds at his inauguration or the reason he lost the popular vote to Hillary Clinton, the news industry is already in a crouch. Educating people to differentiate between true and false news is nearly impossible when it is so much easier and more comfortable for people to read only what reconfirms their worldview. Foremski leaves it up to the experts and the visionaries to solve the problem and find a way to place a monetary value on professional news media.

Chelsea Kerwin, March 29, 2017

Intelligence Researchers Pursue Comprehensive Text Translation

March 27, 2017

The US Intelligence Advanced Research Projects Agency (IARPA) is seeking programmers to help develop a tool that can quickly search text in over 7,000 languages. ArsTechnica reports on the initiative (dubbed the Machine Translation for English Retrieval of Information in Any Language, or MATERIAL) in the article, “Intelligence Seeks a Universal Translator for Text Search in Any Language.” As it is, it takes time to teach a search algorithm to translate each language. For the most-used tongues, this process is quite well-along, but not so for “low-resource” languages. Writer Sean Gallagher explains:

To get reliable translation of text based on all variables could take years of language-specific training and development. Doing so for every language in a single system—even to just get a concise summary of what a document is about, as MATERIAL seeks to do—would be a tall order. Which is why one of the goals of MATERIAL, according to the IARPA announcement, ‘is to drastically decrease the time and data needed to field systems capable of fulfilling an English-in, English-out task.’

Those taking on the MATERIAL program will be given access to a limited set of machine translation and automatic speech recognition training data from multiple languages ‘to enable performers to learn how to quickly adapt their methods to a wide variety of materials in various genres and domains,’ the announcement explained. ‘As the program progresses, performers will apply and adapt these methods in increasingly shortened time frames to new languages.’

Interested developers should note candidates are not expected to have foreign-language expertise. Gallagher notes that IARPA plans to publish their research publicly; he looks forward to wider access to foreign-language documents down the road, should the organization meet their goal.

Cynthia Murrell, March 27, 2017

MBAs Under Siege by Smart Software

March 23, 2017

The article titled Silicon Valley Hedge Fund Takes Over Wall Street With AI Trader on Bloomberg explains how Sentient Technologies Inc. plans to take the human error out of the stock market. Babak Hodjat co-founded the company and spent the past 10 years building an AI system capable of reviewing billions of pieces of data and learning trends and techniques to make money by trading stocks. The article states that the system is based on evolution,

According to patents, Sentient has thousands of machines running simultaneously around the world, algorithmically creating what are essentially trillions of virtual traders that it calls “genes.” These genes are tested by giving them hypothetical sums of money to trade in simulated situations created from historical data. The genes that are unsuccessful die off, while those that make money are spliced together with others to create the next generation… Sentient can squeeze 1,800 simulated trading days into a few minutes.

Hodjat believes that handing the reins over to a machine is wise because it eliminates bias and emotions. But outsiders wonder whether investors will be willing to put their trust entirely in a system. Other hedge funds like Man AHL rely on machine learning too, but nowhere near to the extent of Sentient. As Sentient bring in outside investors later this year the success of the platform will become clearer.

Chelsea Kerwin, March 23, 2017

The Human Effort Behind AI Successes

March 14, 2017

An article at Recode, “Watson Claims to Predict Cancer, but Who Trained It To Think,” reminds us that even the most successful AI software was trained by humans, using data collected and input by humans. We have developed high hopes for AI, expecting it to help us cure disease, make our roads safer, and put criminals behind bars, among other worthy endeavors. However, we must not overlook the datasets upon which these systems are built, and the human labor used to create them. Writer (and CEO of DaaS firm Captricity) Kuang Chen points out:

The emergence of large and highly accurate datasets have allowed deep learning to ‘train’ algorithms to recognize patterns in digital representations of sounds, images and other data that have led to remarkable breakthroughs, ones that outperform previous approaches in almost every application area. For example, self-driving cars rely on massive amounts of data collected over several years from efforts like Google’s people-powered street canvassing, which provides the ability to ‘see’ roads (and was started to power services like Google Maps). The photos we upload and collectively tag as Facebook users have led to algorithms that can ‘see’ faces. And even Google’s 411 audio directory service from a decade ago was suspected to be an effort to crowdsource data to train a computer to ‘hear’ about businesses and their locations.

Watson’s promise to help detect cancer also depends on data: decades of doctor notes containing cancer patient outcomes. However, Watson cannot read handwriting. In order to access the data trapped in the historical doctor reports, researchers must have had to employ an army of people to painstakingly type and re-type (for accuracy) the data into computers in order to train Watson.

Chen notes that more and more workers in regulated industries, like healthcare, are mining for gold in their paper archives—manually inputting the valuable data hidden among the dusty pages. That is a lot of data entry. The article closes with a call for us all to remember this caveat: when considering each new and exciting potential application of AI, ask where the training data is coming from.

Cynthia Murrell, March 14, 2017

Big Data Requires More Than STEM Skills

March 13, 2017

It will require training Canada’s youth in design and the arts, as well as STEM subjects if that country is to excel in today’s big-data world. That is the advice of trio of academic researchers in that country, Patricio Davila, Sara Diamond, and Steve Szigeti,  who declare, “There’s No Big Data Without Intelligent Interface” at the Globe and Mail. The article begins by describing why data management is now a crucial part of success throughout society, then emphasizes that we need creative types to design intuitive user interfaces and effective analytics representations. The researchers explain:

Here’s the challenge: For humans, data are meaningless without curation, interpretation and representation. All the examples described above require elegant, meaningful and navigable sensory interfaces. Adjacent to the visual are emerging creative, applied and inclusive design practices in data “representation,” whether it’s data sculpture (such as 3-D printing, moulding and representation in all physical media of data), tangible computing (wearables or systems that manage data through tactile interfaces) or data sonification (yes, data can make beautiful music).

Infographics is the practice of displaying data, while data visualization or visual analytics refers to tools or systems that are interactive and allow users to upload their own data sets. In a world increasingly driven by data analysis, designers, digital media artists, and animators provide essential tools for users. These interpretive skills stand side by side with general literacy, numeracy, statistical analytics, computational skills and cognitive science.

We also learn about several specific projects undertaken by faculty members at OCAD University, where our three authors are involved in the school’s Visual Analytics Lab. For example, the iCity project addresses transportation network planning in cities, and the Care and Condition Monitor is a mobile app designed to help patients and their healthcare providers better work together in pursuit of treatment goals. The researchers conclude with an appeal to their nation’s colleges and universities to develop programs that incorporate data management, data numeracy, data analysis, and representational skills early and often. Good suggestion.

Cynthia Murrell, March 13, 2017

Parlez Vous Qwant, Nest-Ce Pas?

March 2, 2017

One of Google’s biggest rivals is Yandex, at least in Russia.  Yandex is a Russian owned and operated search engine and is more popular in Russia than the Google, depending on the statistics.  It goes to say that a search engine built and designed by native speakers does have a significant advantage over foreign competition, and it looks like France wants a chance to beat Google.  Search Engine Journal reports that, “Qwant, A French Search Engine, Thinks It Can Take On Google-Here’s Why.”

Qwant was only founded in 2013 and it has grown to serve twenty-one million monthly users in thirty countries.  The French search engine has seen a 70% growth each year and it will see more with its recent integration with Firefox and a soon-to-be launched mobile app.  Qwant is very similar to DuckDuckGo in that it does not collect user data.  It also boasts mote search categories than news, images, and video and these include, music, social media, cars, health, music, and others.  Qwant had an interesting philosophy:

The company also has a unique philosophy that artificial intelligence and digital assistants can be educated without having to collect data on users. That’s a completely different philosophy than what is shared by Google, which collects every bit of information it can about users to fuel things like Google Home and Google Allo.

Qwant still wants to make a profit with pay-per-click and future partnerships with eBay and TripAdvisor, but they will do without compromising a user’s privacy.  Qwant has a unique approach to search and building AI assistants, but it has a long way to go before it reaches Google heights.

They need to engage more users not only on laptops and computers but also mobile devices.  They also need to form more partnerships with other browsers.

Bon chance, Qwant!  But could you share how you plan to make AI assistants without user data?

Whitney Grace, March 2, 2017

 

U.S. Government Keeping Fewer New Secrets

February 24, 2017

We have good news and bad news for fans of government transparency. In their Secrecy News blog, the Federation of American Scientists’ reports, “Number of New Secrets in 2015 Near Historic Low.” Writer Steven Aftergood explains:

The production of new national security secrets dropped precipitously in the last five years and remained at historically low levels last year, according to a new annual report released today by the Information Security Oversight Office.

There were 53,425 new secrets (‘original classification decisions’) created by executive branch agencies in FY 2015. Though this represents a 14% increase from the all-time low achieved in FY 2014, it is still the second lowest number of original classification actions ever reported. Ten years earlier (2005), by contrast, there were more than 258,000 new secrets.

The new data appear to confirm that the national security classification system is undergoing a slow-motion process of transformation, involving continuing incremental reductions in classification activity and gradually increased disclosure. …

Meanwhile, ‘derivative classification activity,’ or the incorporation of existing secrets into new forms or products, dropped by 32%. The number of pages declassified increased by 30% over the year before.

A marked decrease in government secrecy—that’s the good news. On the other hand, the report reveals some troubling findings. For one thing, costs are not going down alongside classifications; in fact, they rose by eight percent last year. Also, response times to mandatory declassification requests (MDRs) are growing, leaving over 14,000 such requests to languish for over a year each. Finally, fewer newly classified documents carry the “declassify in ten years or less” specification, which means fewer items will become declassified automatically down the line.

Such red-tape tangles notwithstanding, the reduction in secret classifications does look like a sign that the government is moving toward more transparency. Can we trust the trajectory?

Cynthia Murrell, February 24, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta