DARPA Progresses on Refining Data Analysis

June 12, 2017

The ideal data analysis platform for global intelligence would take all the data in the world and rapidly make connections, alerting law enforcement or the military about potential events before they happen. It would also make it downright impossible for bad actors to hide their tracks. Our government seems to be moving toward that goal with AIDA, or Active Interpretation of Disparate Alternatives. DARPA discusses the project in its post, “DARPA Wades into Murky Multimedia Information Streams to Catch Big Meaning.” The agency states:

The goal of AIDA is to develop a multi-hypothesis ‘semantic engine’ that generates explicit alternative interpretations or meaning of real-world events, situations, and trends based on data obtained from an expansive range of outlets. The program aims to create technology capable of aggregating and mapping pieces of information automatically derived from multiple media sources into a common representation or storyline, and then generating and exploring multiple hypotheses about the true nature and implications of events, situations, and trends of interest.

‘It is a challenge for those who strive to achieve and maintain an understanding of world affairs that information from each medium is often analyzed independently, without the context provided by information from other media,’ said Boyan Onyshkevych, program manager in DARPA’s Information Innovation Office (I2O). ‘Often, each independent analysis results in only one interpretation, with alternate interpretations eliminated due to lack of evidence even in the absence of evidence that would contradict those alternatives. When these independent, impoverished analyses are combined, generally late in the analysis process, the result can be a single apparent consensus view that does not reflect a true consensus.’

AIDA’s goal of presenting an accurate picture of overall context early on will help avoid that problem. The platform is to assign a confidence level to each piece of information it processes and each hypothesis it generates. It will also, they hope, be able to correct for a journalistic spin by examining variables and probabilities. Is the intelligence community is about to gain an analysis platform capable of chilling accuracy?

Cynthia Murrell, June 12, 2017

Bibliophiles Have 25 Million Reasons to Smile

June 6, 2017

The US Library of Congress has released 25 million records of its collection online and are anyone with Internet access is free to use it.

According to Science Alert article titled The US Library of Congress Just Put 25 Million Records Online, Free of Charge:

The bibliographic data sets, like digital library cards, cover music, books, maps, manuscripts, and more, and their publication online marks the biggest release of digital records in the Library’s history.

The Library of Congress has been on digitization spree for long and users can expect more records to be made online in the near future. The challenge, however, is retrieving books or information that the user needs. The web interface is still complicated and not user-friendly. In short, the enterprise search function is a mess. What The Library of Congress really needs is a user-friendly and efficient way of accessing its vast collection of knowledge to bibliophiles.

Vishal Ingole, June 6, 2017

Partnership Hopes to Improve Healthcare through Technology

June 5, 2017

A healthcare research organization and a data warehousing and analytics firm are teaming up to improve patient care, Healthcare IT News reports in, “Health Catalyst, Regenstrief Partner to Commercialize Natural Language Processing Technology.” The technology at hand is the nDepth (NLP Data Extraction Providing Targeted Healthcare) platform, Regenstrief’s specialized data analysis tool. Reporter Bernie Monegain elaborates:

Regenstrief’s nDepth is artificial intelligence-powered text analytics technology. It was developed within the Indiana Health Information Exchange, the largest and oldest HIE in the country. Regenstrief fine-tuned nDepth through extensive and repeated use, searching more than 230 million text records from more than 17 million patients. The goal of the partnership is to speed improvements in patient care by unlocking the unstructured data within electronic health records. Health Catalyst will incorporate nDepth into its data analytics platform in use by health systems that together serve 85 million patients across the country.

In addition, clinicians are contributing their knowledge to build and curate clinical domain expertise and phenotype libraries to augment the platform. Another worthy contributor is Memorial Hospital at Gulfport, which was a co-development partner and was the first to implement the Health Catalyst/ nDepth system.

Based in Indianapolis, the Regenstrief Institute was founded in 1969 with a mission—to facilitate the use of technology to improve patient care. Launched in 2008, Health Catalyst is much younger but holds a similar purpose—to improve healthcare with data analysis and information sharing technologies. That enterprise is based in Salt Lake City.

Cynthia Murrell, June 5, 2017

How Data Science Pervades

May 2, 2017

We think Information Management may be overstating a bit with the headline, “Data Science Underlies Everything the Enterprise Now Does.”  While perhaps not underpinning quite “everything,” the use of data analysis has indeed spread throughout many companies (especially larger ones).

Writer Michael O’Connell cites a few key developments over the last year alone, including the rise of representative data, a wider adoption of predictive analysis, and the refinement of customer analytics. He predicts, even more, changes in the coming year, then uses a hypothetical telecom company for a series of examples. He concludes:

You’ll note that this model represents a significant broadening beyond traditional big data/analytics functions. Such task alignment and comprehensive integration of analytics functions into specific business operations enable high-value digital applications ranging far beyond our sample Telco’s churn mitigation — cross-selling, predictive and condition-based maintenance, fraud detection, price optimization, and logistics management are just a few areas where data science is making a huge difference to the bottom line.

See the article for more on the process of turning data into action, as illustrated with the tale of that imaginary telecom’s data-wrangling adventure.

Cynthia Murrell, May 2, 2017

The Design Is Old School, but the Info Is Verified

April 5, 2017

For a moment, let us go back to the 1990s.  The Internet was still new, flash animation was “da bomb” (to quote the vernacular of the day), and Web site design was plain HTML.  While you could see prime examples of early Web site design visiting the Internet Archive, but why hit the time machine search button when you can simply visit RefDesk.com.

RefDesk is reminiscent of an old AOL landing page, except it lacks the cheesy graphics and provides higher quality information.  RefDesk is an all-inclusive reference and fact checking Web site that pools links of various sources with quality information into one complete resource.  It keeps things simple with the plain HTML format, then it groups sources together based on content and relevance, such as search engines, news outlets, weather, dictionaries, games, white pages, yellow pages, and specialized topics that change daily.  RefDesk’s mission is to take the guesswork out of the Internet:

The Internet is the world’s largest library containing millions of books, artifacts, images, documents, maps, etc. There is but one small problem in this library: everything is scattered about on the floor, with growing hordes of confused and bewildered users frantically shifting through the maze, occasionally crying out, Great Scott, look at what I just found!’ Enter refdesk.

Refdesk has three goals: (1) fast access, (2) intuitive and easy navigation and (3) comprehensive content, rationally indexed. The prevailing philosophy here is: simplicity. “Simplicity is the natural result of profound thought.” And, very difficult to achieve.

Refdesk is the one stop source to find verified, credible resources because a team dedicated to fishing out the facts from the filth that runs amuck on other sites runs it.  It set up shop in 1995 and the only thing that has changed is the information.  It might be basic, it might be a tad bland, but the content is curated to ensure credibility.

Elementary school kids take note; you can use this on your history report.

Whitney Grace, April 5, 2017

 

Dataminr Presented to Foreign Buyers Through Illegal Means

April 4, 2017

One thing that any company wants is more profit.  Companies generate more profit by selling their products and services to more clients.  Dataminr wanted to add more clients to their roster and a former Hillary Clinton wanted to use his political connections to get more clients for Dataminr of the foreign variety.  The Verge has the scoop on how this happened in, “Leaked Emails Reveal How Dataminr Was Pitched To Foreign Governments.”

Dataminr is a company specializing in analyzing Twitter data and turning it into actionable data sets in real-time.  The Clinton aide’s personal company, Beacon Global Strategies, arranged to meet with at least six embassies and pitch Dataminr’s services.  All of this came to light when classified emails were leaked to the public on DCLeaks.com:

The leaked emails shed light on the largely unregulated world of international lobbying in Washington, where “strategic advisors,” “consultants,” and lawyers use their US government experience to benefit clients and themselves, while avoiding public scrutiny both at home and overseas.

Beacon isn’t registered to lobby in Washington. The firm reportedly works for defense contractors and cybersecurity companies, but it hasn’t made its client list public, citing non-disclosure agreements. Beacon’s relationship with Dataminr has not been previously reported.

The aide sold Dataminr’s services in a way that suggest they could be used for surveillance.  Beacon even described Dataminr as a way to find an individual’s digital footprint.  Twitter’s development agreement forbids third parties from selling user data if it will be used for surveillance.  But Twitter owns a 5% stake in Dataminr and allows them direct access to their data firehose.

It sounds like some back alley dealing took place.  The ultimate goal for the Clinton aide was to make money and possibly funnel that back into his company or get a kickback from Dataminr.  It is illegal for a company to act in this manner, says the US Lobbying Disclosure Act, but there are loopholes to skirt around it.

This is once again more proof that while a tool can be used for good, it can also be used in a harmful manner.  It begs the question, though, that if people leave their personal information all over the Internet, is it not free for the taking?

Whitney Grace, April 4, 2017

Alternative (Aka Fake) News Not Going Anywhere

March 29, 2017

The article titled The Rise of Fake News Amidst the Fall of News Media on Silicon Valley Watcher makes a convincing argument that fake news is the inevitable result of the collective failure to invest in professional media. The author, Tom Foremski, used to write for the Financial Times. He argues that the almost ongoing layoffs among professional media organizations such as the New York Times, Salon, The Guardian, AP, Daily Dot, and IBT illustrate the lack of a sustainable business model for professional news media. The article states,

People won’t pay for the news media they should be reading but special interest groups will gladly pay for the media they want them to read. We have important decisions to make about a large number of issues such as the economy, the environment, energy, education, elder healthcare and those are just the ones that begin with the letter “E” — there’s plenty more issues. With bad information we won’t be able to make good decisions. Software engineers call this GIGO – Garbage In Garbage Out.

This issue affects us all; fake news even got a man elected to the highest office in the land.  With Donald Trump demonstrating on a daily basis that he has no interest in the truth, whether, regarding the size of the crowds at his inauguration or the reason he lost the popular vote to Hillary Clinton, the news industry is already in a crouch. Educating people to differentiate between true and false news is nearly impossible when it is so much easier and more comfortable for people to read only what reconfirms their worldview. Foremski leaves it up to the experts and the visionaries to solve the problem and find a way to place a monetary value on professional news media.

Chelsea Kerwin, March 29, 2017

Intelligence Researchers Pursue Comprehensive Text Translation

March 27, 2017

The US Intelligence Advanced Research Projects Agency (IARPA) is seeking programmers to help develop a tool that can quickly search text in over 7,000 languages. ArsTechnica reports on the initiative (dubbed the Machine Translation for English Retrieval of Information in Any Language, or MATERIAL) in the article, “Intelligence Seeks a Universal Translator for Text Search in Any Language.” As it is, it takes time to teach a search algorithm to translate each language. For the most-used tongues, this process is quite well-along, but not so for “low-resource” languages. Writer Sean Gallagher explains:

To get reliable translation of text based on all variables could take years of language-specific training and development. Doing so for every language in a single system—even to just get a concise summary of what a document is about, as MATERIAL seeks to do—would be a tall order. Which is why one of the goals of MATERIAL, according to the IARPA announcement, ‘is to drastically decrease the time and data needed to field systems capable of fulfilling an English-in, English-out task.’

Those taking on the MATERIAL program will be given access to a limited set of machine translation and automatic speech recognition training data from multiple languages ‘to enable performers to learn how to quickly adapt their methods to a wide variety of materials in various genres and domains,’ the announcement explained. ‘As the program progresses, performers will apply and adapt these methods in increasingly shortened time frames to new languages.’

Interested developers should note candidates are not expected to have foreign-language expertise. Gallagher notes that IARPA plans to publish their research publicly; he looks forward to wider access to foreign-language documents down the road, should the organization meet their goal.

Cynthia Murrell, March 27, 2017

MBAs Under Siege by Smart Software

March 23, 2017

The article titled Silicon Valley Hedge Fund Takes Over Wall Street With AI Trader on Bloomberg explains how Sentient Technologies Inc. plans to take the human error out of the stock market. Babak Hodjat co-founded the company and spent the past 10 years building an AI system capable of reviewing billions of pieces of data and learning trends and techniques to make money by trading stocks. The article states that the system is based on evolution,

According to patents, Sentient has thousands of machines running simultaneously around the world, algorithmically creating what are essentially trillions of virtual traders that it calls “genes.” These genes are tested by giving them hypothetical sums of money to trade in simulated situations created from historical data. The genes that are unsuccessful die off, while those that make money are spliced together with others to create the next generation… Sentient can squeeze 1,800 simulated trading days into a few minutes.

Hodjat believes that handing the reins over to a machine is wise because it eliminates bias and emotions. But outsiders wonder whether investors will be willing to put their trust entirely in a system. Other hedge funds like Man AHL rely on machine learning too, but nowhere near to the extent of Sentient. As Sentient bring in outside investors later this year the success of the platform will become clearer.

Chelsea Kerwin, March 23, 2017

The Human Effort Behind AI Successes

March 14, 2017

An article at Recode, “Watson Claims to Predict Cancer, but Who Trained It To Think,” reminds us that even the most successful AI software was trained by humans, using data collected and input by humans. We have developed high hopes for AI, expecting it to help us cure disease, make our roads safer, and put criminals behind bars, among other worthy endeavors. However, we must not overlook the datasets upon which these systems are built, and the human labor used to create them. Writer (and CEO of DaaS firm Captricity) Kuang Chen points out:

The emergence of large and highly accurate datasets have allowed deep learning to ‘train’ algorithms to recognize patterns in digital representations of sounds, images and other data that have led to remarkable breakthroughs, ones that outperform previous approaches in almost every application area. For example, self-driving cars rely on massive amounts of data collected over several years from efforts like Google’s people-powered street canvassing, which provides the ability to ‘see’ roads (and was started to power services like Google Maps). The photos we upload and collectively tag as Facebook users have led to algorithms that can ‘see’ faces. And even Google’s 411 audio directory service from a decade ago was suspected to be an effort to crowdsource data to train a computer to ‘hear’ about businesses and their locations.

Watson’s promise to help detect cancer also depends on data: decades of doctor notes containing cancer patient outcomes. However, Watson cannot read handwriting. In order to access the data trapped in the historical doctor reports, researchers must have had to employ an army of people to painstakingly type and re-type (for accuracy) the data into computers in order to train Watson.

Chen notes that more and more workers in regulated industries, like healthcare, are mining for gold in their paper archives—manually inputting the valuable data hidden among the dusty pages. That is a lot of data entry. The article closes with a call for us all to remember this caveat: when considering each new and exciting potential application of AI, ask where the training data is coming from.

Cynthia Murrell, March 14, 2017

Next Page »

  • Archives

  • Recent Posts

  • Meta