Transcribing Podcasts with Help from Amazon

January 19, 2018

I enjoy walking the dog and listening to podcasts. However, I read more quickly than I listen. Speed up is a feature which works well for those in their mid 20s. At age 74, not so much.

Few podcasts create transcripts. Kudos to Steve Gibson at Security Now. He pays for this work himself because other podcasts on the Twit network don’t offer much in the way of transcripts. And in the case of This Week in Law, there aren’t weekly programs. Recently, no programs. Helpful, no?

You can get the basics of the transcriptions produced by Amazon Transcribe in “Podcast Transcription with Amazon Transcribe.”

One has to be a programmer to use the service. Here’s the passage in the write up I highlighted:

The first thing that I would want out of this is speaker detection, i.e. knowing how many different speakers there are and to be able to differentiate their voices. Podcasts typically have more than one host, or a host and a guest for an interview, so that would be helpful. Also, it would be great to be able to send back corrections on words somehow, to help with the training. I’m sure Amazon has a pretty good thing going, but maybe on an account level? Or for proper nouns? I still think it would be good for people to provide that feedback.

Perhaps the podcast transcript void can be filled—at long last.

Stephen E Arnold, January 19, 2018

Dark Cyber: A New HonkinNews Series from Stephen E Arnold

November 21, 2017

HonkinNews is back with a new series of videos. You can watch the program at this link on YouTube. Dark Cyber presents selected news from the Beyond Search blog and from the research conducted for Stephen E Arnold’s Dark Web Notebook, a companion to the hidden Internet tailored to the needs of security, law enforcement, and intelligence professionals. In this first Dark Cyber program, you will learn about an information-packed report about surveillance technologies and practices. The report, published by the Electronic Freedom Foundation, is available without charge. The push for a backdoor to encrypted information continues. We report that Senator Mitch McConnell from Kentucky is criticizing Facebook and Google for the firm’s perceived reluctance to assist with certain legitimate requests for information. But the Kentucky senator is a small cog in a larger push by the US government to obtain backdoors to unlock encrypted data. Funding continues to flow into Dark Cyber firms. We review three cash infusions and compare those amounts to the massive funding provided to the UK firm Darktrace. Arnold addresses the widely-held belief that the Tor software bundles delivers bulletproof Web access. One key point is that Tor’s security fixes do not address the monitoring of Tor entry and exit servers and log file analysis. For daily news and information about the Surface Web and Dark Web, read Beyond Search at www.arnoldit.com/wordpress. PS. Yes, there is a Harrod’s Creek duck in the video. Here’s that link again: https://youtu.be/a6WiGC2W13g

Kenny Toth, November 21, 2017

Listen Notes: A Podcast Search Engine

October 18, 2017

I read “This Podcast Search Engine Helps You Discover New Shows You’ll Love.” The search engine is called Listen Notes. The content pool is 350,000 podcasts and about 20 million episodes. I ran a query for the popular Twit.tv podcast Security Now. No hits. I then ran a query for the unpopular HonkinNews program which I did for one year. No hits. Your mileage may very. As horrible as the iTunes search system is, it sort of works for podcasts. I am still in search of a good enough podcast search tool. Maybe Listen Notes should just use one of the remarkable enterprise search systems which handle “all” content. Yeah, that will work.

Stephen E Arnold, October 18, 2017

Google and Video Search: Still a Challenge

August 31, 2017

I read “How YouTube Perfected the Feed.” The main idea is that Google used smart software to make YouTube videos easier to find. The trick is not keyword search. Google’s YouTube, which the write up calls a “pillar of the Internet,” uses signals to identify what a person want. Then smart software delivers recommendations. The “new” YouTube’s secret sauce is described this way:

McFadden [a Google wizard] revealed the source of YouTube’s suddenly savvy recommendations: Google Brain, the parent company’s artificial intelligence division, which YouTube began using in 2015. Brain wasn’t YouTube’s first attempt at using AI; the company had applied machine-learning techniques to recommendations before, using a Google-built system known as Sibyl. Brain, however, employs a technique known as unsupervised learning: its algorithms can find relationships between different inputs that software engineers never would have guessed. One of the key things it does is it’s able to generalize,” McFadden said. “Whereas before, if I watch this video from a comedian, our recommendations were pretty good at saying, here’s another one just like it. But the Google Brain model figures out other comedians who are similar but not exactly the same — even more adjacent relationships. It’s able to see patterns that are less obvious.”

The point of the exercise is to generate more ad revenue. With competition from Facebook and others, Google is facing another crack in its control of search. Amazon may generate three times the number of product searches as Google. That’s another problem for the GOOG.

Now the talk about smart software is thrilling to many. For me, I highlighted this statement in the article as quite suggestive about the method:

YouTube’s emphasis on videos related to ones you might like means that its feed consistently seems broader in scope — more curious — than its peers. The further afield YouTube looks for content, the more it feels like an escape from other feeds.

The smart software is not about search. Google is processing signals and looking for similarities. I don’t want to be a grouser, but these themes have peppered Google patent documents and technical papers for many years. In my Google: The Digital Gutenberg I reviewed some of the wonkier video ideas. (By the way, the “Gutenberg” metaphor refers to the automatically generated content which Google outputs in response to user actions. Facebook may be more prolific today, but when I was working on Google: The Digital Gutenberg, Google had the distinction of being the world’s largest digital artifact producer.

Several observations:

First, finding videos remains a difficult information retrieval task. I recall the promising approach of Exalead, before Dassault bought the company and used the technology to reduce its dependence on Autonomy and deploy a way to find nuts and bolts. Exalead converted text to speech, generated some semi-useful metadata, and allowed me to search for a word or phrase. The system would then display links to videos which contained the string. The problem with video search is that it is visual and, to my knowledge, no one has figured out how to have software convert an image to a searchable  string. Years ago, I saw a demo from an Israeli company whose software could “watch” a soccer match and flag the goals sometimes. Google’s video search is useful when one looks for words in video titles, video descriptions, video channel names, or the entity producing or starring in the video.

Second, recommendations work reasonably well for digital Walmart-type shoppers. However, many recommendations are off the wall. I bought a bottle of itch reliever spray for my dog. The product was designed for saddle horses. Now Amazon happily shows me boots, bits, and bridles. Other recommendation systems will work the same way. The reason? Signals are given incorrect “weights” and the clustering methods drift away because many smart software methods are “greedy.” (I have a for fee lecture on this subject which is pretty darned interesting and important. Curious? Write benkent2020 at yahoo dot com for info.)

Third, Google’s smart software for video continues to struggle with uploads that are on some pretty dicey topics. I routinely get links to YouTube videos which require me to be over 18. You can check out Google’s filtering for certain content by running queries on both YouTube.com and GoogleVideo.com for “nasheed.” Yep, interesting “promotional” videos are in evidence.

Net net: Talk about smart software creates the impression that great progress in video content access is being made. I agree. There is progress; however, finding videos remains a work in progress.

I suppose Amazon will sell me a horse when it runs out of farm fresh Echoes. Google is recommending videos to me which don’t match what I usually look for. I was curious about non Newtonian fluids. Guess what Google suggested I view? A Chinese table tennis match and my own video.

There you go.

Stephen E Arnold, August 31, 2017

HonkinNews for April 11, 2017, Now Available

April 11, 2017

This week’s HonkinNews video program leads with information about Bitext, a company providing breakthrough deep linguistic analysis solutions. In order to put the comments of Dr. Antonio Valderrabanos in perspective, HonkinNews takes a look at the “promo” article discussing IBM’s cognitive computing activities. There is one key difference highlighted in HonkinNews: IBM talks jargon in recycled marketing language and Bitext’s CEO talks about the company’s rapid growth and licensing deals with companies like Audi, Renault, and one of the largest players in the mobile device and mobile services market. The program also looks at the remarkable 9,000 word Fortune Magazine article about Palantir Technologies’ interaction with US government procurement agencies. The very long article does not describe Palantir’s technical innovations nor does the Fortune analysis explain why using commercial off-the-shelf software for intelligence work makes sense. News about the Dark Web Notebook teams three presentations at the prestigious TechnoSecurity & Digital Forensics Conference in June 2017 complements a special offer for the only handbook to Dark Web investigations available. For discount information, check out the links displayed in the video. The video also takes a look at the new Yahoo. Once the transformation of Yahoo into Oath with a punctuation mark no less takes place, the Yahoo yodel will become a faint auditory memory. Does the HonkinNews item trigger an auditory memory. Watch this week’s video to find out. You can watch the video at this link.

Kenny Toth, April 11, 2017

HonkinNews for 28 Feb 2017 Now Available

February 28, 2017

This week’s HonkinNews considers the Facebook “manifesto.” Our interpretation is that companies like Facebook are countries too. Aren’t we lucky? The IBM security conference is scheduled for March 2017 and Beyond Search was invited. We assume that the data science root access breach will be one highlighted case study. The program also comments on the Pinterest Lens technology. Now after “pintering”, one can locate and buy a product. No words required. Two stories illustrate the depth or shallowness of thinking about online research. We present a list of “must use” search engines and note some notable omissions. Then we consider a comparison of conducting research on an ad supported system versus the commercial databases, books, and journals at a first-rate research library like Dartmouth’s. The subject of Google’s Loon balloons drifts in as well. We consider the question: Will Facebook free Internet drones engage in combat with Google’s free Internet Loon balloons? You can find it at this link.

Kenny Toth, February 28, 2017

HonkinNews for 21 February Now Available

February 21, 2017

Hang onto your lightweight mobile. HonkinNews lets you watch recall, precision, and relevance being kicked to pieces by a real live SEO expert and famed author. We love that “famed” thing. You will also get a peek at how to visualize innovation. Inside the box and outside the box look tame compared to our view of the real world. We give you a tip for searching for an image in the Metropolitan Museum of Art’s 350,000 digital collection. You may not like the answer. We did not. If you have a mainframe in your home office, you can load Watson and let it index your significant other’s recipes, or you can process a local bank’s overnight cash transactions. Either way, IBM gives you some Watson juice. And you will get a bit of information about Yahoo’s most recent security issue. Yep, yabba dabba hoot.

Kenny Toth, February 21, 2017

HonkinNews for 14 February 2017 Now Available

February 14, 2017

Want some tax love? HonkinNews explains that you can visit an H&R Block store front and “touch” IBM Watson. Sounds inviting, doesn’t it? You will also learn about the fate of Lexmark’s search and content businesses under the firm’s new ownership. Denmark has appointed an ambassador to Sillycon Valley. Perhaps Apple, Facebook, and Google really are nation states? Google’s cloud wizard has some job advice for the newly terminated. Perhaps dog training collars are a breakthrough for those eager to acquire news skills. Lucid Imagination became Lucidworks. Now the company has positioned itself to deliver Exalead style search based applications. The play did not work too well for Exalead, which wrote the book about SBAs. Will Lucidworks make the me-too strategy pay off for the company’s backers and their tens of millions of dollars? We also catalog the many ways to search using the Pixel phone. Whatever happened to universal search?  We reveal where to live if you want easy access to old fashioned book stores. No, it is not Harrod’s Creek, Kentucky. You can view the video at this link.

Kenny Toth, February 14, 2017

Metropolitan Museum of Arts: Images There but Findability Not

February 14, 2017

I recall the Google Life Magazine image collection. I noted the BBC archive of programs. I checked out the Internet Archive’s rich media collection. Years ago I worked on the Library of Congress’ American Memory project. These have a unifying thread:

The content is essentially unfindable.

I read “Metropolitan Museum of Art Puts 375,000 Public Domain Images in Creative Commons.” The write up explains:

As part of a new initiative it’s calling Open Access, the Metropolitan Museum of Art in New York has placed 375,000 images of public-domain works in the Creative Commons. This major, though not unprecedented, move by one of the world’s most important museums means that users can now access pictures of many of the Met’s holdings on Wikimedia…

You can try out the search system at this link. Good luck finding images. Remember Caravaggio is spelled with two g’s. Oh, the query returned a number of false drops.

The way to find images is to browse. Fun. Time consuming. Not good.

Stephen E Arnold, February 14, 2017

HonkinNews for 7 February 2017 Now Available

February 7, 2017

This week’s program highlights Google’s pre school and K-3 robot innovation from Boston Dynamics. In June 2016 we thought Toyota was purchasing the robot reindeer company. We think Boston Dynamics may still be part of the Alphabet letter set. Also, curious about search vendor pivots. Learn about two shuffles (Composite Software and CopperEye) which underscore why plain old search is a tough market. You will learn about the Alexa Conference and the winner of the Alexathon. Alexa seems to be a semi hot product. When will we move “beyond Alexa”? Social media analysis has strategic value? What vendor seems to have provided “inputs” to the Trump campaign and the Brexit now crowd? HonkinNews reveals the hot outfit making social media data output slick moves. We provide a run down of some semantic “news” which found its way to Harrod’s Creek. SEO, writing tips, and a semantic scorecard illustrate the enthusiasm some have for semantics. We’re not that enthusiastic, however. Google is reducing its losses from its big bets like the Loon balloon. How much? We reveal the savings, and it is a surprising number. And those fun and friendly robots. Yes, the robots. You can view the video at this link. Google Video provides a complete run down of the HonkinNews programs too. Just search for HonkinNews.

Kenny Toth, February 7, 2017

Next Page »

  • Archives

  • Recent Posts

  • Meta