CyberOSINT banner

Tribler: A File Finder That Legal Eagles Will Want to Check

September 3, 2014

Short honk: We learned about Tribler, a rich media file finder. There is an interesting body of content; for example rich media. The site says:

Tribler can find files for you. No need for websites. Tribler can do 100 Mbps, sadly we cannot fix slow Internet or poor swarms. Lots of “pro” features: magnet links, streaming, sub-second search, channels and our upcoming anonymous mode.

Note the word “anonymous.” Tribler can play videos. The site says, “You can watch even before the download is finished.”


For more information, navigate to

Stephen E Arnold, September 3, 2014

Yahoo Flickr Images: Does Search Work?

August 31, 2014

I think you know the answer if you are a regular reader of Beyond Search.


Finding images is a tedious and time consuming business. I know what the marketing collateral and public relations noise suggests. One can search by photographer, color, yada, yada.

The reality is that finding an image requires looking at images. Some find this fun, particularly if the client is paying by the hour for graphic expertise. For me, image search underscores how primitive information retrieval tools are.

Feel free to disagree.

To test Yahoo Flickr search, navigate to “Welcome to the Internet Archive to the the Commons.” Check out the sample entry to the millions of public domain images.


Darned meaty.

To search the “Commons”, one has to navigate to the Commons page and scroll down to the search box highlighted in yellow in this screenshot:


Enter a query like this one “18th century elocution.”

Here’s what the system displayed:


I then tried this query “london omnibus 1870”.

Here’s what the system displayed:


No omnibuses.

Like many image retrieval systems, the user has to fiddle with queries until images are spotted by manual inspection.

The archive is useful. Finding images in Yahoo Flickr remains a problem for me. I thought Xooglers knew quite a bit about search. You know: Finding information when the user enters a key word or two.

Stephen E Arnold, August 31, 2014

Video Ads: Print Publisher Reveals the Unviewable Truth

May 5, 2014

I read “The Great Unwatched.” Clever title. (Keep in mind the link may go dead and you will have to hunt for a hard copy. Good luck with that, gentle reader.)  The main point is that video ads do not draw eyeballs. Er, this is a revelation I suppose. What I find interesting is that in my poking into video on the Web something became obvious years ago; to wit, put up a lot of videos and the videos don’t get much action. Sure, there may be a breakaway video that draws lots of eyeballs, but those viral wonders are tough to predict.

Now, what about ads? People want to turn them off or ignore them. There is a reason that regular TV commercials blast sound. Couch potatoes and walking media consumers want what they want, not what advertisers want them to want.

The New York Times reports, as real journalists do, the following:

By many estimates, more than half of online video ads are not seen, either because they are buried low on web pages or run in tiny, easily ignored video players on those pages, or run simultaneously with other ads. Vindico, an ad management platform company, deemed 57 percent of two billion video ads surveyed over two months to be “unviewable.”

There you have it. Most people don’t watch video ads.

I thought that Google’s gyrations were a pretty strong hint that video ads were an issue. The companies pumping money into ad videos may not be overwhelmed with customer demands for their products. The Web site data we examined showed that video was fun to talk about, often fun to produce, and probably fascinating for a handful of people. But getting the videos watched was a problem. If videos are not watched, what’s this mean for video ads? My understanding is that video ads are a sales disappointment.

There are some interesting implications. First, Google and others looking for video to deliver the next influx of easy money may have to rethink their assumptions. Second, fun stuff like making videos may have the value of a ride on a roller coaster. Once the ride is over, more fun requires another ride. There is limited satisfaction from the carnival attraction. Third, marketers may find themselves looking for a way to generate leads and makes sales that actually work. In short, video dreams disappear like the image on a display screen when the power cuts off.

Making an ad video is way more entertaining than watching a video ad. Just don’t tell anyone who does not “get” the joy of non linear editing.

Stephen E Arnold, May 5, 2014

Sail Labs Sets Up a One Stop Download Shop

January 20, 2014

Rather than having to read and click through an entire Web site, Sail Labs Technology took a page out of simplicity’s book and placed all of their information in the Download Center. Sail Labs does not dump all of their information in one part of the Web site and wish visitors good luck. They follow the usual Web 2.0 format and follow a standard organization regiment. The Download Center acts as more of an index with the entire Web site’s information downloadable in PDFs.

Sail Labs is world-leading developer in speech technology and multimedia analysis.

“We address the markets of rich media indexing and communication mining, offering cutting-edge technologies in areas such as automatic speech recognition, speaker identification, entity-and topic detection across multiple languages, geographies, and sources. Visualization components (clustering, relationships, trends, GIS) and ontologies complete our product portfolio.”

The company is based in Vienna, Austria. Sail Labs has grown from a small company and continues to garner potential investors and create high-quality software. Sail Labs still remains loyal to its roots by being 100% Austrian owned. Its headlining products are the Media Mining Indexer that allows users to process speech from multiple sources and make real-time annotated text output and the OSINIT line creates actionable intelligence based on multiple sources.

Sail Labs may not have all of the glamour and glitz of Nuance, but they do have a compelling resume based on all of the information in the Download Center.

Whitney Grace, January 20, 2014

Sponsored by, developer of Augmentext

Old Media Scripps Buys Newsy for Digital Video Platform

December 17, 2013

An article on TechCrunch titled Scripps Buys Newsy For $35M to Expand from TV and Newspapers to Digital Video explains the acquisition of Newsy, the media startup that is a digital video news platform, by Scripps, the TV and newspaper magnate. A Youtube video heralds the subsidization of Newsy, which should be made final in the beginning of 2014.

The article explains:

“This is about Scripps… buying an asset that gives it a digital video component to complement its existing TV and online services — effectively a bridge between the three areas where it already does business if you also count newspapers. It also gives the company access into an audience that consumes their news (and video) on devices like tablets, and has largely turned away from some of those more traditional platforms where Scripps still bases a majority of its business.”

Old media is on the move (quite old, Scripps was founded in 1879.) The company spent 35M on the acquisition, which it believes will bring them into the next generation of digital audiences. Newsy’s ad-supported videos are presently sent through web, mobile, tablet and certain TV platforms. The article suggests that the partnership Newsy had with AOL, Microsoft and Mashable may continue, but the companies haven’t announced their plans yet.

Chelsea Kerwin, December 17, 2013

Sponsored by, developer of Augmentext

YouTube: Search for Comments

November 18, 2013

I am not a video goose. I cannot recall the last time I commented on a video. However, I have asked some of my researchers to search for YouTube comments. Well, my recollection is that YouTube “comments” search is not particularly helpful.

I read “Forced Google Plus Integration on YouTube Backfires, Petition Hits 112,000.” I learned that Google is requiring a Google Plus account in order to make comments about a YouTube video. Some YouTube fans are not happy. The big question is, “Will Google listen?”

What is important is that the article reports a modest movement to post YouTube comments on Reddit’s search function leaves something to be desired. However, my researchers have informed me that Reddit search does work reasonably well.

My view is that Google is trying to cement its revenue opportunities. Google Plus is part of the grand strategy. Search is not number one on the agenda in my opinion. The emergence of an option like Reddit may be an important step. Google fans may have to fend for themselves as Google works overtime to make sure it can hit its revenue numbers.

Those criticizing Google may find that their actions misfire.

Stephen E Arnold, November 18, 2013

Readin, Riten, Rithmatik, and Guzzlin

November 8, 2013

I read “America’s Media Guzzling Ways.” Good word “guzzling” or “guzzlin” as it is pronounced in rural Kentucky. The write up contained a factoid that I find difficult to grasp; to wit:

The amount of media data, measured in printed text, that Americans consumed last year. That’s 6.9 zettabytes—6.9 million-million gigabytes—to be exact.

Let’s assume that the figure is dead accurate or a couple of zettabytes, plus or minus. According the article, each person in the US spends 15 hours a day checking Facebook, watching videos, and tapping screens.

My reaction is that the consumption of media contributes to these observed events yesterday:

  1. A sponsored event at a trade show was attended by about 15 people. None of those at the hoe down were employees of the company. I suppose the guzzling of digital content was more important than showing up and pretending to be thrilled that potential customers were eating free snacks and drinking no name beverages. YouTube cannot wait, people.
  2. A conference program that did not include information about one of the speakers. Heck, it was an oversight even thought that speaker was paid to attend, received a free hotel room, and a free registration. Facebook posts take priority with this outfit I surmise.
  3. A sign at the National Press Club that contained a misspelling. It is the spelling checker’s fault was one explanation. SMS spelling is the way to go. LOL
  4. Asking for directions from a bus driver elicited this statement when I asked, “Where is 999 9th Street, NW.” The professional driver replied, “Dude, my iPhone is not connecting. Ask someone else.” The bus driver did not meet my gaze. He was frantically scanning the street for a mobile phone shop.

The article helps me understand why information presented on a mobile device is perceived as accurate, complete, and current. The grazing public has neither the time nor the grit to do much reading, writing, or arithmetic I fear. Oh, as the National Press Club sign maker would have it: Readin, writin, rithmetic, and guzzlin.

One person looked for Cuba Libre Restaurant using Google Maps. No joy. The system displayed four choices, none of which was the desired restaurant. The smart system made it impossible for the iPhone user to locate the destination. Fascinatin’.

Stephen E Arnold, November 8, 2013

SharePoint Being Prepped For Rich Media Content

October 4, 2013

According to PR Newswire, a very important event will take place on September 26, 2013: “Equilibrium And Metalogix To Discuss How To Optimize SharePoint For Rich Media.” Executives from each of the above companies will host a webinar called,” “Enhancing SharePoint to Manage Large Files including Rich Media Content.” The presenters will be Sean Barger, Founder, and Laura Clemons, VP Product Management, from Equilibrium and Trevor Hellebuyck, CTO of Metalogix. The group will describe the newest solutions for making SharePoint capable to work with rich media, including scalability, management of large digital media asset libraries, mobility, audio, video, and CAT storage/distribution.

Here is a more detailed list of the topics:

“During the event, the presenters will discuss how the combination of Equilibrium’s MediaRich ECM for SharePoint and Metalogix StoragePoint can improve any Microsoft SharePoint deployment without requiring modifications. Attendees will also learn:

  • Best practices for management of large files in SharePoint
  • How to overcome common issues, such as slow uploads/downloads and time-outs
  • How to optimize SharePoint for video playback”

Rich media is the next phase of content management as documentation moves away from basic paper replication. It is important to be able to search these content types, as Arnold IT’s Steve Arnold has mentioned, because as the content changes search needs to become richer and more thorough to meet the demands.

Whitney Grace, October 4, 2013

Rich Media: Too Expensive to Store?

July 30, 2013

I saw an interesting post called “Cost of Storing All Human Audio visual Experiences.” I am no logician, but if one stores “all”, then isn’t the cost infinite? The person writing the post presents some data which pegs the cost for seven billion people at about $1 trillion a year.

Several observations:

  1. With the emergence of smart nanodevices with audio and video capabilities, perhaps the estimate is off the mark?
  2. Once the data are captured, who manages the content? Likely candidates include nation states, companies which operate as nation states, or venture funded start ups?
  3. How does one find a particular time segment germane to a query pertinent to a patent claim?

Interesting question if one sets aside the “all”. The next time I look for a video on YouTube or Vimeo, I will ask myself, “What type of search system is needed to deal with even larger volumes of rich media?”

Is the new Dark Ages of information access fast approaching? Yikes! Has the era already arrived?

Stephen E Arnold, July 30, 2013

Sponsored by Xenky

Image Search and, of Course, Google

June 13, 2013

Many years ago I lectured in Japan. On that visit, I saw a demonstration of a photo recognition system. Click on a cow and the system would return other four legged animals —- most of the time. Some years later I was asked to review facial recognition systems after a notable misfire in a major city. Since then, my team and I check out the systems which become known to us.

Progress is being made. That’s encouraging. However, a number of challenges have to be resolved. These range from false positives to context failures. In the case of a false positive, the person or thing in the picture is not the person or thing one sought. In the case of context failure, the cow painted on the side of a truck is not the same as a cow standing in a field with many other cows clumped around.

Software is bumping up against computational boundaries. The methods available have to be optimized to run in available resources. When there are bigger and faster systems, then fancier math can be used. Today’s innovations boil down, in my opinion, to clever manipulations of well known systems and methods. The reason many software systems perform in a similar manner is that these systems share many procedures. Innovation is often optimization and packaging, not a leap frog to more sophisticated numerical procedures. Trimming, chopping down, and streamlining via predictive methods are advancing the ball down the field.

I read with interest “Improving Photo Search: A Step across the Semantic Gap.” Google has rolled out enhanced photo search. The system works better than other systems. As Google phrases it:

We built and trained models similar to those from the winning team using software infrastructure for training large-scale neural networks developed at Google in a group started by Jeff Dean and Andrew Ng. When we evaluated these models, we were impressed; on our test set we saw double the average precision when compared to other approaches we had tried. We knew we had found what we needed to make photo searching easier for people using Google. We acquired the rights to the technology and went full speed ahead adapting it to run at large scale on Google’s computers. We took cutting edge research straight out of an academic research lab and launched it, in just a little over six months. You can try it out at Why the success now? What is new? Some things are unchanged: we still use convolutional neural networks — originally developed in the late 1990s by Professor Yann LeCun in the context of software for reading handwritten letters and digits. What is different is that both computers and algorithms have improved significantly. First, bigger and faster computers have made it feasible to train larger neural networks with much larger data. Ten years ago, running neural networks of this complexity would have been a momentous task even on a single image — now we are able to run them on billions of images. Second, new training techniques have made it possible to train the large deep neural networks necessary for successful image recognition.

The use of “semantics” is also noteworthy. As I wrote in my analysis of Google Voice for a large investment bank, “Google has an advantage because it has data others do not have.” When it comes to predictive methods and certain types of semantics, the Google data sets give it an advantage over some rivals.

What applied to Google Voice applies to Google photo search. Google is able to tap its data to make educated guesses about images. The semantics and the infrastructure have a turbo boosting effect on Google.

The understatement in the Google message should not be taken at face value. The Google is increasing its lead over its rivals and preparing to move into completely new areas of revenue generation. Images? A step but an important one.

Stephen E Arnold, June 13, 2013

Sponsored by Xenky

« Previous PageNext Page »