From the Desk of Captain Obvious: How Image Recognition Mostly Works

July 8, 2019

Want to be reminded about how super duper image recognition systems work? If so, navigate to the capitalist’s tool “Facebook’s ALT Tags Remind Us That Deep Learning Still Sees Images as Keywords.” The DarkCyber teams knows that this headline is designed to capture clicks and certainly does not apply to every image recognition system available. But if the image is linked via metadata to something other than a numeric code, then images are indeed mapped to words. Words, it turns out, remain useful in our video and picture first world.

Nevertheless, the write up offers some interesting comments, which is what the DarkCyber research team expects from the capitalist tool. (One of our DarkCyber team saw Malcolm Forbes at a Manhattan eatery keeping a close eye on a spectacularly gaudy motorcycle. Alas, that Mr. Forbes is no longer with us, although the motorcycle probably survives somewhere unlike the “old” Forbes’ editorial policies.

Here’s the passage:

For all the hype and hyperbole about the AI revolution, today’s best deep learning content understanding algorithms are still remarkably primitive and brittle. In place of humans’ rich semantic understanding of imagery, production image recognition algorithms see images merely through predefined galleries of metadata tags they apply based on brittle and naïve correlative models that are trivially confused.

Yep, and ultimately the hundreds of millions of driver license pictures will be mapped to words; for example, name, address, city, state, zip, along with a helpful pointer to other data about the driver.

The capitalist tool reminds the patient reader:

Today’s deep learning algorithms “see” imagery by running it through a set of predefined models that look for simple surface-level correlative patterns in the arrangement of its pixels and output a list of subject tags much like those human catalogers half a century ago.

Once again, no push back from Harrod’s Creek. However, it is disappointing that new research is not referenced in the article; for example, the companies involved in Darpa Upside.

Stephen E Arnold, July 8, 2019

Twitch Incorporates ClipMine Discovery Tools

September 18, 2017

Gameplay-streaming site Twitch has adapted the platform of their acquisition ClipMine, originally developed for adding annotations to online videos, into a metadata-generator for its users. (Twitch is owned by Amazon.) TechCrunch reports the development in, “Twitch Acquired Video Indexing Platform ClipMine to Power New Discovery Features.” Writer Sarah Perez tells us:

The startup’s technology is now being put to use to translate visual information in videos – like objects, text, logos and scenes – into metadata that can help people more easily find the streams they want to watch. Launched back in 2015, ClipMine had originally introduced a platform designed for crowdsourced tagging and annotations. The idea then was to offer a technology that could sit over top videos on the web – like those on YouTube, Vimeo or DailyMotion – that allowed users to add their own annotations. This, in turn, would help other viewers find the part of the video they wanted to watch, while also helping video publishers learn more about which sections were getting clicked on the most.

Based in Palo Alto, ClipMine went on to make indexing tools for the e-sports field and to incorporate computer vision and machine learning into their work. Their platform’s ability to identify content within videos caught Twitch’s eye; Perez explains:

Traditionally, online video content is indexed much like the web – using metadata like titles, tags, descriptions, and captions. But Twitch’s streams are live, and don’t have as much metadata to index. That’s where a technology like ClipMine can help. Streamers don’t have to do anything differently than usual to have their videos indexed, instead, ClipMine will analyze and categorize the content in real-time.

ClipMine’s technology has already been incorporated into stream-discovery tools for two games from Blizzard Entertainment, “Overwatch” and “Hearthstone;” see the article for more specifics on how and why. Through its blog, Twitch indicates that more innovations are on the way.

Cynthia Murrell, September 18, 2017

Diffeo Incorporates Meta Search Technology

March 24, 2017

Will search-and-discovery firm  Diffeo’s recent acquisition give it the edge? Yahoo Finance shares, “Diffeo Acquires Meta Search and Launches New Offering.” Startup Meta Search developed a local computer and cloud search system that uses smart indexing to assign index terms and keep the terms consistent. Diffeo provides a range of advanced content processing services based on collaborative machine intelligence. The press release specifies:

Diffeo’s content discovery platform accelerates research analysts by applying text analytics and machine intelligence algorithms to users’ in-progress files, so that it can recommend content that fills in knowledge gaps — often before the user thinks of searching. Diffeo acts as a personal research assistant that scours both the user’s files and the Internet. The company describes its technology as collaborative machine intelligence.

Diffeo and Meta’s services complement each other. Meta provides unified search across the content on all of a user’s cloud platforms and devices. Diffeo’s Advanced Discovery Toolbox displays recommendations alongside in-progress documents to accelerate the work of research analysts by uncovering key connections.

Meta’s platform integrates cloud environments into a single keyword search interface, enabling users to search their files on all cloud drives, such as Dropbox, Google Drive, Slack and Evernote all at once. Meta also improves search quality by intelligently analyzing each document, determining the most important concepts, and automatically applying those concepts as ‘Smart Tags’ to the user’s documents.

This seems like a promising combination. Founded in 2012, Diffeo made Meta Search its first acquisition on January 10 of this year. The company is currently hiring. Meta Search, now called Diffeo Cloud Search, is based in Boston.

Cynthia Murrell, March 24, 2017

Chan and Zuckerberg Invest in Science Research Search Engine, Meta

March 1, 2017

Mark Zuckerberg and his wife Priscilla Chan have dedicated a portion of their fortune to philanthropy issues through their own organization, the Chan Zuckerberg InitiativeTech Crunch shares that one of their first acquisitions is to support scientific research, “Chan Zuckerberg Initiative Acquires And Will Free Up Science Search Engine Meta.”

Meta is a search engine dedicated to science research papers and it is powered by artificial intelligence.  Chan and Zuckerberg plan to make Meta free in a few months, but only after they have enhanced it.  Once released, Meta will help scientists find the latest papers in their study fields, which is awesome as these papers are usually blocked behind paywalls.  What is even better is that Meta will also assist funding organizations with research and areas with potential for investment/impact.  What makes Meta different from other search engines or databases is quite fantastic:

What’s special about Meta is that its AI recognizes authors and citations between papers so it can surface the most important research instead of just what has the best SEO. It also provides free full-text access to 18,000 journals and literature sources.

Meta co-founder and CEO Sam Molyneux writes that “Going forward, our intent is not to profit from Meta’s data and capabilities; instead we aim to ensure they get to those who need them most, across sectors and as quickly as possible, for the benefit of the world.

CZI invested $3 billion dedicated to curing all diseases and they already built the Biohub in San Francisco for medical research.  Meta works like this:

Meta, formerly known as Sciencescape, indexes entire repositories of papers like PubMed and crawls the web, identifying and building profiles for the authors while analyzing who cites or links to what. It’s effectively Google PageRank for science, making it simple to discover relevant papers and prioritize which to read. It even adapts to provide feeds of updates on newly published research related to your previous searches.

Meta is an ideal search engine, because it crawls the entire Web (supposedly) and returns verified information, not to mention potential research partnerships and breakthroughs.  This is the type of database researchers have dreamed of for years.  Would CZI be willing to fund something similar for fields other than science?  Will they run into trouble with other organizations less interested in philanthropy?

Whitney Grace, March 1, 2017

A New Search Engine Targeting Scientific Researchers Touts AI

January 27, 2017

The article titled How a New AI Powered Search Engine Is Changing How Neuroscientists Do Research on Search Engine Watch discusses the new search engine geared towards scientific researchers. It is called Semantic Scholar, and it uses AI to provide a comprehensive resource to scientists. The article explains,

This new search engine is actually able to think and analyze a study’s worth. GeekWire notes that, “Semantic Scholar uses data mining, natural language processing, and computer vision to identify and present key elements from research papers.” The engine is able to understand when a paper is referencing its own study or results from another source. Semantic Scholar can then identify important details, pull figures, and compare one study to thousands of other articles within one field.

This ability to rank and sort papers by relevance is tremendously valuable given the vast number of academic papers online. Google Scholar, by comparison, might lead a researcher in the right direction with its index of over 200 million articles, it simply does not have the same level of access to metadata that researchers need such as how often a paper or author has been cited. The creators of Semantic Scholar are not interested in competing with Google, but providing a niche search engine tailored to meet the needs of the scientific community.

Chelsea Kerwin, January 27, 2017

Apache Tika Could Be the Google of Dark Web?

January 16, 2017

Conventional search engines can effectively index text based content. However, Apache Tika, a system developed by Defense Advanced Research Projects Agency (DARPA) can identify and analyze all kinds of content. This might enable law enforcement agencies to track all kind of illicit activities over Dark Web and possibly end them.

An article by Christian Mattmann titled Could This Tool for the Dark Web Fight Human Trafficking and Worse? that appears on Startup Smart says:

At present the most easily indexed material from the web is text. But as much as 89 to 96 percent of the content on the internet is actually something else – images, video, audio, in all thousands of different kinds of non-textual data types. Further, the vast majority of online content isn’t available in a form that’s easily indexed by electronic archiving systems like Google’s.

Apache Tika, which Mattmann helped develop bridges the gap by analyzing Metadata of the content type and then identifying content of the file using techniques like Named Entity Recognition (NER). Apache Tika was instrumental in tracking down players in Panama Scandal.

If Apache Tika is capable of what it says, many illicit activities over Dark Web like human trafficking, drug and arms peddling can be stopped in its tracks. As the author points out in the article:

Employing Tika to monitor the deep and dark web continuously could help identify human- and weapons-trafficking situations shortly after the photos are posted online. That could stop a crime from occurring and save lives.

However, the system is not sophisticated enough to handle the amount of content that is out there. Being an open source code, in near future someone may be able to make it capable of doing so. Till then, the actors of Dark Web can heave a sigh of relief.

Vishal Ingole, January 16, 2017

 

Companies to Watch: Geo-Data Analytics

November 1, 2016

I noted “NGA Chooses 16 Orgs for Disparate Data Challenge Phase 2.” “NGA” is the acronym for the National Geospatial Intelligence Agency. The geo-analytics folks at this unit do some fascinating things. The future, however, demands that today’s good enough is not sufficient. NGA tapped 15 outfits to do some poking around in their innovation tool chests. Here are the firms:

  • App Symphony
  • Blue Zoo
  • CyberGIS
  • Diffeo
  • Enigma
  • Envitia
  • GeoFairy
  • MARI
  • MediaFlux
  • Meta DDC
  • Paxata
  • Pyxis
  • RAMADDA
  • SitScape
  • Sourcerer
  • Voyager

Recognize any of these outfit? Familiarity might be a useful task.

Stephen E Arnold, November 1, 2016

Machine Learning Changes the Way We Learn from Data

October 26, 2016

The technology blog post from Danial Miessler titled Machine Learning is the New Statistics strives to convey a sense of how crucial Machine Learning has become in terms of how we gather information about the world around us. Rather than dismissing Machine Learning as a buzzword, the author heralds Machine Learning as an advancement in our ability to engage with the world around us. The article states,

So Machine Learning is not merely a new trick, a trend, or even a milestone. It’s not like the next gadget, instant messaging, or smartphones, or even the move to mobile. It’s nothing less than a foundational upgrade to our ability to learn about the world, which applies to nearly everything else we care about. Statistics greatly magnified our ability to do that, and Machine Learning will take us even further.

The article breaks down the steps of our ability to analyze our own reality, moving from randomly explaining events, to explanations based on the past, to explanations based on comparisons with numerous trends and metadata. The article positions Machine Learning as the next step, involving an explanation that compares events but simultaneously progresses the comparison by coming up with new models. The difference is of course that Machine Learning offers the ability of continuous model improvement. If you are interested, the blog also offers a Machine Learning Primer.

Chelsea Kerwin, October 26, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

New Tor Communication Software for Journalists and Sources Launches

February 29, 2016

A new one-to-one messaging tool for journalists has launched after two years in development. The article Ricochet uses power of the dark web to help journalists, sources dodge metadata laws from The Age describes this new darknet-based software. The unique feature of this software, Ricochet, in comparison to others used by journalists such as Wickr, is that it does not use a server but rather Tor. Advocates acknowledge the risk of this Dark Web software being used for criminal activity but assert the aim is to provide sources and whistleblowers an anonymous channel to securely release information to journalists without exposure. The article explains,

“Dr Dreyfus said that the benefits of making the software available would outweigh any risks that it could be used for malicious purposes such as cloaking criminal and terrorist operations. “You have to accept that there are tools, which on balance are a much greater good to society even though there’s a tiny possibility they could be used for something less good,” she said. Mr Gray argued that Ricochet was designed for one-to-one communications that would be less appealing to criminal and terrorist organisers that need many-to-many communications to carry out attacks and operations. Regardless, he said, the criminals and terrorists had so many encryption and anonymising technologies available to them that pointing fingers at any one of them was futile.”

Online anonymity is showing increasing demand as evidenced through the recent launch of several new Tor-based softwares like Ricochet, in addition to Wickr and consumer-oriented apps like Snapchat. The Dark Web’s user base appears to be growing and diversifying. Will public perception follow suit?

 

Megan Feil, February 29, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Microsoft AI Faves

February 9, 2016

I noted a blog post called “From Discovery to Selection: Announcing the Seattle Accelerator’s Third Batch.” The post lists companies which Microsoft wants to nurture. Here’s the list:

  • Affinio: Audience insights
  • Agolo: Summarization of text
  • Clarify: Rich media search
  • Defined Crowd: Natural language processing
  • Knomos: Palantir style analysis
  • Medwhat: Doctor made of soft software
  • OneBridge: Middleware for Microsoft cloud
  • Percolata: Retail staff monitoring
  • Plexuss: Palantir style analysis
  • Sim Machines: Similarity search and pattern recognition

Net net: Microsoft continues to hunt for solutions in search and analytics. There is a touch of “me too” in the niche plays too. Persistence is a virtue.

Stephen E Arnold, February 9, 2016

Next Page »

  • Archives

  • Recent Posts

  • Meta