Terror Database Enriched with Social Media Pix

April 24, 2018

A question is surging through the tech and espionage communities after a recent article that makes some big implications in both worlds. That’s because a company formed by ex-spies is using facial recognition software to create a database of images from social networks like Facebook. This raises a ton of questions, but they all start with the recent Daily Mail piece, “Surveillance Company Run by Ex-Spies is Harvesting Facebook Photos.”

According to the story, the program is called Face-Int and they have a specific goal in mind:

“Its creators say the software could lead to the identification of terror suspects, captured in promotional and other material posted online… “Experts are concerned that the company’s efforts extend beyond this remit, however, and into the political realm…’It raises the stakes of face recognition – it intensifies the potential negative consequences,’ Jay Stanley, senior policy analyst at the American Civil Liberties Union, told Forbes.”

While it is admirable that a company is aiming to help capture terrorists through social media, it leaves one to worry about several things. For starters, it’s pretty safe to assume many terrorists will not appear on social media or, at the least, not without something covering their face. Thus, accuracy becomes a concern. However, the larger concern is that This, however, does not touch upon the greater concern that private, law abiding citizens are also getting funneled into this database. The opportunities for invading one’s privacy is alarmingly high. Time will tell how this shakes out, but we have a hunch the general public will never be told.

Patrick Roland, April 24, 2018

Facial Recognition for a Certain Type of Bro

April 11, 2018

Male white privilege is a topic that pervades social and cultural discourse, but according to The Seattle Times the bias exists in facial recognition technology, “Facial-Recognition Technology Works Best If You’re A White Guy, Study Says.” AI’s ability to recognize people is improving more and more each day. The technology’s developers improve the technology by feeding AI data that help it learn to discern between physical differences such as gender, skin color, facial features, and other traits. It seems, however, that the data groups are overwrought with white men.

Apparently facial recognition software is 99 percent accurate in identifying white men, but the darker a person’s skin is the more errors that arise. MIT researcher Joy Buolamwini discovered the disparities and said it was a reflection of real word biases. The AI is only as smart as the people that program it:

“In modern artificial intelligence, data rules. AI software is only as smart as the data used to train it. If there are many more white men than black women in the system, it will be worse at identifying the black women. One widely used facial recognition data set was estimated to be more than 75 percent male and more than 80 percent white, according to another research study.”

Another alarming factor is that facial recognition and related technologies have a high adoption rate, such as companies that use them to target social media ads and automated decisions such as hiring people and money lending. Do not forget that law enforcement officials are relying more on the technology and minorities are more likely to singled out in databases.

While this information is disparaging, it makes a bigger issue out of something that can be easily remedied. Yes, the data is skewed towards white males, because, based on statistics, more white men work in the technology field so they draw on data they have ready access to. It is the same with the genetics field, European and Asian genes are more accurately represented than African DNA, because these countries are more developed than the mother continent. To resolve this conundrum, they need to start feeding facial recognition technology data with more females and people with darker skin. It is probably not that hard to find the data, just visit social media or an image library, then download away.

Whitney Grace, April 11, 2018

Artificial Intelligence: Tiny Ears May Listen Well

March 29, 2018

The allegations that Facebook-type companies can “listen” to one’s telephone conversations or regular conversations may be “fake” news. But the idea is worth considering.

Artificial intelligence’s ability to process written data is unparalleled. However, the technology has always lagged pretty severely when it comes to spoken words. Soon, that will be a thing of the past if this recent article is to be believed. We learned more from the Smart Data Collective piece, “Natural Language Processing: An Essential Element of Artificial Intelligence.”

According to the story:

“Natural Language Processing (NLP) is an important part of artificial intelligence which is being researched upon to aid enterprises and businesses in the quick, speedy and fast retrieval of both structured and unstructured organizational data when needed. In simple terms, natural language processing (NLP), is the skill of a machine to understand and process human language within the context in which it is spoken.”

This technology is really taking off in the food industry. According to sources, shoppers in London are the first to use language processing apps to help them determine what vitamins their body may be lacking. It may sound like a stretch, but this is the sweet spot where AI really soars. The technology seems to really take off in industries that previously felt like it needed no help. Watch for language processing to begin bleeding into everyday life elsewhere, too. If one is carrying a mobile phone, is it listening and recording, converting text to speech, and indexing that content for psychographic analysis?

Patrick Roland, March 29, 2018

A Step Forward but Museum Image Collections Remain a Search Challenge

March 8, 2018

For a few decades, art and history museums have been struggling with their online presences. The experience of seeing a Jpeg of a painting or sculpture is not the same as seeing it in person. That’s true. But there is one area where museums are holding a lot of valuable data and just now it’s starting to be searchable. We discovered this recently when the Metropolitan Museum of Art’s database “MetPublications.”

According to the page:

“MetPublications includes a description and table of contents for most titles, as well as information about the authors, reviews, awards, and links to related Met titles by author and by theme. Current book titles that are in-print may be previewed and fully searched online, with a link to purchase the book. The full contents of almost all other book titles may be read online, searched, or downloaded as a PDF.”

This includes over five hundred books about various exhibits that have spanned the last five decades. These slim volumes, usually released in conjunction with various exhibits, is fully searchable and a huge score for art lovers and historians. Previously, it was seen as too daunting and, potentially impossible. As far back as 2002 Computer Weekly was bemoaning the fact that museums had missed the digital boat. Turns out museums like the Met didn’t miss the boat, it’s just that their ship sails a little more slowly than the white knuckle world of Silicon Valley. Better late than never, we say.

Patrick Roland, March 8, 2018

Visual Search Enters Its Next Phase

February 16, 2018

About a year ago, some of the biggest names in search declared that visual search was the next big horizon in the industry and that they were pouring great gobs of money into this world. If you are like us, visual search is not exactly part of your everyday life yet. But, that doesn’t mean it isn’t evolving, as we discovered in a fascinating Digital Trends story, “Not Happy With Pinterest Search Results? Refine it With Text and Photo Queries.”

According to the story:

Pinterest announced the addition of text searches that work within the visual search tool, allowing users to give Pinterest Lens a bit more direction on the intent of the search. According to Pinterest, users make an average of 600 million searches every month.”

That’s a serious trend and an uptick from past numbers we have seen. However, all these advances still don’t seem to be creeping into our daily life…yet. As reported by IT Pro Portal, retailers are seriously starting to adopt visual search technology. This directly stems from the rise of shopping via cell phone, as opposed to laptops. And, as we all know, phones are custom made for visual search thanks to their cameras. The technology sounds like it is there, our interest is there as shoppers, and we think the storm is on the horizon where visual search overtakes the retail market soon.

Patrick Roland, February 16, 2018

Transcribing Podcasts with Help from Amazon

January 19, 2018

I enjoy walking the dog and listening to podcasts. However, I read more quickly than I listen. Speed up is a feature which works well for those in their mid 20s. At age 74, not so much.

Few podcasts create transcripts. Kudos to Steve Gibson at Security Now. He pays for this work himself because other podcasts on the Twit network don’t offer much in the way of transcripts. And in the case of This Week in Law, there aren’t weekly programs. Recently, no programs. Helpful, no?

You can get the basics of the transcriptions produced by Amazon Transcribe in “Podcast Transcription with Amazon Transcribe.”

One has to be a programmer to use the service. Here’s the passage in the write up I highlighted:

The first thing that I would want out of this is speaker detection, i.e. knowing how many different speakers there are and to be able to differentiate their voices. Podcasts typically have more than one host, or a host and a guest for an interview, so that would be helpful. Also, it would be great to be able to send back corrections on words somehow, to help with the training. I’m sure Amazon has a pretty good thing going, but maybe on an account level? Or for proper nouns? I still think it would be good for people to provide that feedback.

Perhaps the podcast transcript void can be filled—at long last.

Stephen E Arnold, January 19, 2018

Dark Cyber: A New HonkinNews Series from Stephen E Arnold

November 21, 2017

HonkinNews is back with a new series of videos. You can watch the program at this link on YouTube. Dark Cyber presents selected news from the Beyond Search blog and from the research conducted for Stephen E Arnold’s Dark Web Notebook, a companion to the hidden Internet tailored to the needs of security, law enforcement, and intelligence professionals. In this first Dark Cyber program, you will learn about an information-packed report about surveillance technologies and practices. The report, published by the Electronic Freedom Foundation, is available without charge. The push for a backdoor to encrypted information continues. We report that Senator Mitch McConnell from Kentucky is criticizing Facebook and Google for the firm’s perceived reluctance to assist with certain legitimate requests for information. But the Kentucky senator is a small cog in a larger push by the US government to obtain backdoors to unlock encrypted data. Funding continues to flow into Dark Cyber firms. We review three cash infusions and compare those amounts to the massive funding provided to the UK firm Darktrace. Arnold addresses the widely-held belief that the Tor software bundles delivers bulletproof Web access. One key point is that Tor’s security fixes do not address the monitoring of Tor entry and exit servers and log file analysis. For daily news and information about the Surface Web and Dark Web, read Beyond Search at www.arnoldit.com/wordpress. PS. Yes, there is a Harrod’s Creek duck in the video. Here’s that link again: https://youtu.be/a6WiGC2W13g

Kenny Toth, November 21, 2017

Listen Notes: A Podcast Search Engine

October 18, 2017

I read “This Podcast Search Engine Helps You Discover New Shows You’ll Love.” The search engine is called Listen Notes. The content pool is 350,000 podcasts and about 20 million episodes. I ran a query for the popular Twit.tv podcast Security Now. No hits. I then ran a query for the unpopular HonkinNews program which I did for one year. No hits. Your mileage may very. As horrible as the iTunes search system is, it sort of works for podcasts. I am still in search of a good enough podcast search tool. Maybe Listen Notes should just use one of the remarkable enterprise search systems which handle “all” content. Yeah, that will work.

Stephen E Arnold, October 18, 2017

Google and Video Search: Still a Challenge

August 31, 2017

I read “How YouTube Perfected the Feed.” The main idea is that Google used smart software to make YouTube videos easier to find. The trick is not keyword search. Google’s YouTube, which the write up calls a “pillar of the Internet,” uses signals to identify what a person want. Then smart software delivers recommendations. The “new” YouTube’s secret sauce is described this way:

McFadden [a Google wizard] revealed the source of YouTube’s suddenly savvy recommendations: Google Brain, the parent company’s artificial intelligence division, which YouTube began using in 2015. Brain wasn’t YouTube’s first attempt at using AI; the company had applied machine-learning techniques to recommendations before, using a Google-built system known as Sibyl. Brain, however, employs a technique known as unsupervised learning: its algorithms can find relationships between different inputs that software engineers never would have guessed. One of the key things it does is it’s able to generalize,” McFadden said. “Whereas before, if I watch this video from a comedian, our recommendations were pretty good at saying, here’s another one just like it. But the Google Brain model figures out other comedians who are similar but not exactly the same — even more adjacent relationships. It’s able to see patterns that are less obvious.”

The point of the exercise is to generate more ad revenue. With competition from Facebook and others, Google is facing another crack in its control of search. Amazon may generate three times the number of product searches as Google. That’s another problem for the GOOG.

Now the talk about smart software is thrilling to many. For me, I highlighted this statement in the article as quite suggestive about the method:

YouTube’s emphasis on videos related to ones you might like means that its feed consistently seems broader in scope — more curious — than its peers. The further afield YouTube looks for content, the more it feels like an escape from other feeds.

The smart software is not about search. Google is processing signals and looking for similarities. I don’t want to be a grouser, but these themes have peppered Google patent documents and technical papers for many years. In my Google: The Digital Gutenberg I reviewed some of the wonkier video ideas. (By the way, the “Gutenberg” metaphor refers to the automatically generated content which Google outputs in response to user actions. Facebook may be more prolific today, but when I was working on Google: The Digital Gutenberg, Google had the distinction of being the world’s largest digital artifact producer.

Several observations:

First, finding videos remains a difficult information retrieval task. I recall the promising approach of Exalead, before Dassault bought the company and used the technology to reduce its dependence on Autonomy and deploy a way to find nuts and bolts. Exalead converted text to speech, generated some semi-useful metadata, and allowed me to search for a word or phrase. The system would then display links to videos which contained the string. The problem with video search is that it is visual and, to my knowledge, no one has figured out how to have software convert an image to a searchable  string. Years ago, I saw a demo from an Israeli company whose software could “watch” a soccer match and flag the goals sometimes. Google’s video search is useful when one looks for words in video titles, video descriptions, video channel names, or the entity producing or starring in the video.

Second, recommendations work reasonably well for digital Walmart-type shoppers. However, many recommendations are off the wall. I bought a bottle of itch reliever spray for my dog. The product was designed for saddle horses. Now Amazon happily shows me boots, bits, and bridles. Other recommendation systems will work the same way. The reason? Signals are given incorrect “weights” and the clustering methods drift away because many smart software methods are “greedy.” (I have a for fee lecture on this subject which is pretty darned interesting and important. Curious? Write benkent2020 at yahoo dot com for info.)

Third, Google’s smart software for video continues to struggle with uploads that are on some pretty dicey topics. I routinely get links to YouTube videos which require me to be over 18. You can check out Google’s filtering for certain content by running queries on both YouTube.com and GoogleVideo.com for “nasheed.” Yep, interesting “promotional” videos are in evidence.

Net net: Talk about smart software creates the impression that great progress in video content access is being made. I agree. There is progress; however, finding videos remains a work in progress.

I suppose Amazon will sell me a horse when it runs out of farm fresh Echoes. Google is recommending videos to me which don’t match what I usually look for. I was curious about non Newtonian fluids. Guess what Google suggested I view? A Chinese table tennis match and my own video.

There you go.

Stephen E Arnold, August 31, 2017

HonkinNews for April 11, 2017, Now Available

April 11, 2017

This week’s HonkinNews video program leads with information about Bitext, a company providing breakthrough deep linguistic analysis solutions. In order to put the comments of Dr. Antonio Valderrabanos in perspective, HonkinNews takes a look at the “promo” article discussing IBM’s cognitive computing activities. There is one key difference highlighted in HonkinNews: IBM talks jargon in recycled marketing language and Bitext’s CEO talks about the company’s rapid growth and licensing deals with companies like Audi, Renault, and one of the largest players in the mobile device and mobile services market. The program also looks at the remarkable 9,000 word Fortune Magazine article about Palantir Technologies’ interaction with US government procurement agencies. The very long article does not describe Palantir’s technical innovations nor does the Fortune analysis explain why using commercial off-the-shelf software for intelligence work makes sense. News about the Dark Web Notebook teams three presentations at the prestigious TechnoSecurity & Digital Forensics Conference in June 2017 complements a special offer for the only handbook to Dark Web investigations available. For discount information, check out the links displayed in the video. The video also takes a look at the new Yahoo. Once the transformation of Yahoo into Oath with a punctuation mark no less takes place, the Yahoo yodel will become a faint auditory memory. Does the HonkinNews item trigger an auditory memory. Watch this week’s video to find out. You can watch the video at this link.

Kenny Toth, April 11, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta