March 24, 2017
Will search-and-discovery firm Diffeo’s recent acquisition give it the edge? Yahoo Finance shares, “Diffeo Acquires Meta Search and Launches New Offering.” Startup Meta Search developed a local computer and cloud search system that uses smart indexing to assign index terms and keep the terms consistent. Diffeo provides a range of advanced content processing services based on collaborative machine intelligence. The press release specifies:
Diffeo’s content discovery platform accelerates research analysts by applying text analytics and machine intelligence algorithms to users’ in-progress files, so that it can recommend content that fills in knowledge gaps — often before the user thinks of searching. Diffeo acts as a personal research assistant that scours both the user’s files and the Internet. The company describes its technology as collaborative machine intelligence.
Diffeo and Meta’s services complement each other. Meta provides unified search across the content on all of a user’s cloud platforms and devices. Diffeo’s Advanced Discovery Toolbox displays recommendations alongside in-progress documents to accelerate the work of research analysts by uncovering key connections.
Meta’s platform integrates cloud environments into a single keyword search interface, enabling users to search their files on all cloud drives, such as Dropbox, Google Drive, Slack and Evernote all at once. Meta also improves search quality by intelligently analyzing each document, determining the most important concepts, and automatically applying those concepts as ‘Smart Tags’ to the user’s documents.
This seems like a promising combination. Founded in 2012, Diffeo made Meta Search its first acquisition on January 10 of this year. The company is currently hiring. Meta Search, now called Diffeo Cloud Search, is based in Boston.
Cynthia Murrell, March 24, 2017
March 1, 2017
Mark Zuckerberg and his wife Priscilla Chan have dedicated a portion of their fortune to philanthropy issues through their own organization, the Chan Zuckerberg Initiative. Tech Crunch shares that one of their first acquisitions is to support scientific research, “Chan Zuckerberg Initiative Acquires And Will Free Up Science Search Engine Meta.”
Meta is a search engine dedicated to science research papers and it is powered by artificial intelligence. Chan and Zuckerberg plan to make Meta free in a few months, but only after they have enhanced it. Once released, Meta will help scientists find the latest papers in their study fields, which is awesome as these papers are usually blocked behind paywalls. What is even better is that Meta will also assist funding organizations with research and areas with potential for investment/impact. What makes Meta different from other search engines or databases is quite fantastic:
What’s special about Meta is that its AI recognizes authors and citations between papers so it can surface the most important research instead of just what has the best SEO. It also provides free full-text access to 18,000 journals and literature sources.
Meta co-founder and CEO Sam Molyneux writes that “Going forward, our intent is not to profit from Meta’s data and capabilities; instead we aim to ensure they get to those who need them most, across sectors and as quickly as possible, for the benefit of the world.
CZI invested $3 billion dedicated to curing all diseases and they already built the Biohub in San Francisco for medical research. Meta works like this:
Meta, formerly known as Sciencescape, indexes entire repositories of papers like PubMed and crawls the web, identifying and building profiles for the authors while analyzing who cites or links to what. It’s effectively Google PageRank for science, making it simple to discover relevant papers and prioritize which to read. It even adapts to provide feeds of updates on newly published research related to your previous searches.
Meta is an ideal search engine, because it crawls the entire Web (supposedly) and returns verified information, not to mention potential research partnerships and breakthroughs. This is the type of database researchers have dreamed of for years. Would CZI be willing to fund something similar for fields other than science? Will they run into trouble with other organizations less interested in philanthropy?
Whitney Grace, March 1, 2017
January 27, 2017
The article titled How a New AI Powered Search Engine Is Changing How Neuroscientists Do Research on Search Engine Watch discusses the new search engine geared towards scientific researchers. It is called Semantic Scholar, and it uses AI to provide a comprehensive resource to scientists. The article explains,
This new search engine is actually able to think and analyze a study’s worth. GeekWire notes that, “Semantic Scholar uses data mining, natural language processing, and computer vision to identify and present key elements from research papers.” The engine is able to understand when a paper is referencing its own study or results from another source. Semantic Scholar can then identify important details, pull figures, and compare one study to thousands of other articles within one field.
This ability to rank and sort papers by relevance is tremendously valuable given the vast number of academic papers online. Google Scholar, by comparison, might lead a researcher in the right direction with its index of over 200 million articles, it simply does not have the same level of access to metadata that researchers need such as how often a paper or author has been cited. The creators of Semantic Scholar are not interested in competing with Google, but providing a niche search engine tailored to meet the needs of the scientific community.
Chelsea Kerwin, January 27, 2017
January 16, 2017
Conventional search engines can effectively index text based content. However, Apache Tika, a system developed by Defense Advanced Research Projects Agency (DARPA) can identify and analyze all kinds of content. This might enable law enforcement agencies to track all kind of illicit activities over Dark Web and possibly end them.
An article by Christian Mattmann titled Could This Tool for the Dark Web Fight Human Trafficking and Worse? that appears on Startup Smart says:
At present the most easily indexed material from the web is text. But as much as 89 to 96 percent of the content on the internet is actually something else – images, video, audio, in all thousands of different kinds of non-textual data types. Further, the vast majority of online content isn’t available in a form that’s easily indexed by electronic archiving systems like Google’s.
Apache Tika, which Mattmann helped develop bridges the gap by analyzing Metadata of the content type and then identifying content of the file using techniques like Named Entity Recognition (NER). Apache Tika was instrumental in tracking down players in Panama Scandal.
If Apache Tika is capable of what it says, many illicit activities over Dark Web like human trafficking, drug and arms peddling can be stopped in its tracks. As the author points out in the article:
Employing Tika to monitor the deep and dark web continuously could help identify human- and weapons-trafficking situations shortly after the photos are posted online. That could stop a crime from occurring and save lives.
However, the system is not sophisticated enough to handle the amount of content that is out there. Being an open source code, in near future someone may be able to make it capable of doing so. Till then, the actors of Dark Web can heave a sigh of relief.
Vishal Ingole, January 16, 2017
November 1, 2016
I noted “NGA Chooses 16 Orgs for Disparate Data Challenge Phase 2.” “NGA” is the acronym for the National Geospatial Intelligence Agency. The geo-analytics folks at this unit do some fascinating things. The future, however, demands that today’s good enough is not sufficient. NGA tapped 15 outfits to do some poking around in their innovation tool chests. Here are the firms:
- App Symphony
- Blue Zoo
- Meta DDC
Recognize any of these outfit? Familiarity might be a useful task.
Stephen E Arnold, November 1, 2016
October 26, 2016
The technology blog post from Danial Miessler titled Machine Learning is the New Statistics strives to convey a sense of how crucial Machine Learning has become in terms of how we gather information about the world around us. Rather than dismissing Machine Learning as a buzzword, the author heralds Machine Learning as an advancement in our ability to engage with the world around us. The article states,
So Machine Learning is not merely a new trick, a trend, or even a milestone. It’s not like the next gadget, instant messaging, or smartphones, or even the move to mobile. It’s nothing less than a foundational upgrade to our ability to learn about the world, which applies to nearly everything else we care about. Statistics greatly magnified our ability to do that, and Machine Learning will take us even further.
The article breaks down the steps of our ability to analyze our own reality, moving from randomly explaining events, to explanations based on the past, to explanations based on comparisons with numerous trends and metadata. The article positions Machine Learning as the next step, involving an explanation that compares events but simultaneously progresses the comparison by coming up with new models. The difference is of course that Machine Learning offers the ability of continuous model improvement. If you are interested, the blog also offers a Machine Learning Primer.
February 29, 2016
A new one-to-one messaging tool for journalists has launched after two years in development. The article Ricochet uses power of the dark web to help journalists, sources dodge metadata laws from The Age describes this new darknet-based software. The unique feature of this software, Ricochet, in comparison to others used by journalists such as Wickr, is that it does not use a server but rather Tor. Advocates acknowledge the risk of this Dark Web software being used for criminal activity but assert the aim is to provide sources and whistleblowers an anonymous channel to securely release information to journalists without exposure. The article explains,
“Dr Dreyfus said that the benefits of making the software available would outweigh any risks that it could be used for malicious purposes such as cloaking criminal and terrorist operations. “You have to accept that there are tools, which on balance are a much greater good to society even though there’s a tiny possibility they could be used for something less good,” she said. Mr Gray argued that Ricochet was designed for one-to-one communications that would be less appealing to criminal and terrorist organisers that need many-to-many communications to carry out attacks and operations. Regardless, he said, the criminals and terrorists had so many encryption and anonymising technologies available to them that pointing fingers at any one of them was futile.”
Online anonymity is showing increasing demand as evidenced through the recent launch of several new Tor-based softwares like Ricochet, in addition to Wickr and consumer-oriented apps like Snapchat. The Dark Web’s user base appears to be growing and diversifying. Will public perception follow suit?
Megan Feil, February 29, 2016
February 9, 2016
I noted a blog post called “From Discovery to Selection: Announcing the Seattle Accelerator’s Third Batch.” The post lists companies which Microsoft wants to nurture. Here’s the list:
- Affinio: Audience insights
- Agolo: Summarization of text
- Clarify: Rich media search
- Defined Crowd: Natural language processing
- Knomos: Palantir style analysis
- Medwhat: Doctor made of soft software
- OneBridge: Middleware for Microsoft cloud
- Percolata: Retail staff monitoring
- Plexuss: Palantir style analysis
- Sim Machines: Similarity search and pattern recognition
Net net: Microsoft continues to hunt for solutions in search and analytics. There is a touch of “me too” in the niche plays too. Persistence is a virtue.
Stephen E Arnold, February 9, 2016
February 2, 2016
A friend recently told me how they can go months avoiding suspicious emails, spyware, and Web sites on her computer, but the moment she hands her laptop over to her father he downloads a virus within an hour. Despite the technology gap existing between generations, the story goes to show how easy it is to deceive and steal information these days. ExpertClick thinks that metadata might hold the future means for cyber security in “What Metadata And Data Analytics Mean For Data Security-And Beyond.”
The article uses biological analogy to explain metadata’s importance: “One of my favorite analogies is that of data as proteins or molecules, coursing through the corporate body and sustaining its interrelated functions. This analogy has a special relevance to the topic of using metadata to detect data leakage and minimize information risk — but more about that in a minute.”
This plays into new companies like, Ayasdi, using data to reveal new correlations using different methods than the standard statistical ones. The article compares this to getting to the data atomic level, where data scientists will be able to separate data into different elements and increase the analysis complexity.
“The truly exciting news is that this concept is ripe for being developed to enable an even deeper type of data analytics. By taking the ‘Shape of Data’ concept and applying to a single character of data, and then capturing that shape as metadata, one could gain the ability to analyze data at an atomic level, revealing a new and unexplored frontier. Doing so could bring advanced predictive analytics to cyber security, data valuation, and counter- and anti-terrorism efforts — but I see this area of data analytics as having enormous implications in other areas as well.”
There are more devices connected to the Internet than ever before and 2016 could be the year we see a significant rise in cyber attacks. New ways to interpret data will leverage predictive and proactive analytics to create new ways to fight security breaches.
December 16, 2015
We thought it was a problem if law enforcement officials did not know how the Internet and Dark Web worked as well as the capabilities of eDiscovery tools, but a law firm that does not know how to work with data-mining tools much less the importance of technology is losing credibility, profit, and evidence for cases. According to Information Week in “Data, Lawyers, And IT: How They’re Connected” the modern law firm needs to be aware of how eDiscovery tools, predictive coding, and data science work and see how they can benefit their cases.
It can be daunting trying to understand how new technology works, especially in a law firm. The article explains how the above tools and more work in four key segments: what role data plays before trial, how it is changing the courtroom, how new tools pave the way for unprecedented approaches to law practice, how data is improving how law firms operate.
Data in pretrial amounts to one word: evidence. People live their lives via their computers and create a digital trail without them realizing it. With a few eDiscovery tools lawyers can assemble all necessary information within hours. Data tools in the courtroom make practicing law seem like a scenario out of a fantasy or science fiction novel. Lawyers are able to immediately pull up information to use as evidence for cross-examination or to validate facts. New eDiscovery tools are also good to use, because it allows lawyers to prepare their arguments based on the judge and jury pool. More data is available on individual cases rather than just big name ones.
“The legal industry has historically been a technology laggard, but it is evolving rapidly to meet the requirements of a data-intensive world.
‘Years ago, document review was done by hand. Metadata didn’t exist. You didn’t know when a document was created, who authored it, or who changed it. eDiscovery and computers have made dealing with massive amounts of data easier,’ said Robb Helt, director of trial technology at Suann Ingle Associates.”
Legal eDiscovery is one of the main branches of big data that has skyrocketed in the past decade. While the examples discussed here are employed by respected law firms, keep in mind that eDiscovery technology is still new. Ambulance chasers and other law firms probably do not have a full IT squad on staff, so when learning about lawyers ask about their eDiscovery capabilities.