Podcast Search Service

December 18, 2015

I read “Podcasting’s Search Problem Could be Solved by This Spanish Startup.” According to the write up:

Smab’s web app will automatically transcribe podcasts, giving listeners a way to scan and search their content.

What’s the method? I learned from the article:

The company takes audio files and generates text files. If those text files are hosted on Smab’s site, a person can click on a word in the transcript and it will take them directly to that part of the recording, because the transcript and the text are synced. In fact, a second program assesses the audio to determine where sentences begin, making it easier to find chunks of audio. Both functions are uneven, but it’s worth noting here that the company is in a very early stage.

There are three challenges for automatic voice to text to indexing from audio and video sources:

First, there is a great deal of content. The computational cost to covert a large chunk of audio data to a searchable form and then offer a reasonably robust search engine is significant.

Second, selectivity requires an editorial policy. Business and government are likely paying customers, but the topics these folks chase change frequently. The risk is that a paying customer will be disappointed and drop the service. Thus, sustainable revenue may be an issue.

Third, indexing podcasts and YouTube is work that Apple handles rather off handedly and YouTube performs as part of its massive investment in the Google search system. The fact that neither of these firms has pushed forward with more sophisticated search systems suggests that market demand may not be significant.

I hope the Smab service becomes available. Worth watching.

Stephen E Arnold, December 21, 2015

Terms and Military Symbols Explicated

December 18, 2015

Do you want to know what after action review or mission creep mean to someone in the US government? Now available is “ADRP1-02 Terms and Military Symbols.” The 350 page document is darned useful for those who do not have a G-2 around to clarify.

Stephen E Arnold, December 18, 2015

Topsy: Good Bye, Gentle Search Engine

December 18, 2015

I used Topsy as a way to search certain social content. No more. The service, she be dead.

The money constrained Apple has shut down the public Topsy search system. “Social Analytics Firm Topsy Shut Down by Apple Two Years After Purchase.”

If you want a recommendation for an alternative, sorry, I don’t have one. There are some solutions that are not free to the general public. The gateways to social media content require money and a bit of effort. If you cannot search content, maybe the content does not exist? That’s a comforting thought unless one knows that the content is available, just not searchable by a person with an Internet connection in a public library, at home, or from the local Apple store.

Stephen E Arnold, December 21, 2015

New Patent for a Google PageRank Methodology

December 18, 2015

Google recently acquired a patent for a different approach to page ranking, we learn from “Recalculating PageRank” at SEO by the Sea. Though the patent was just granted, the application was submitted back in 2006. Writer Bill Slawski informs us:

“Under this new patent, Google adds a diversified set of trusted pages to act as seed sites. When calculating rankings for pages. Google would calculate a distance from the seed pages to the pages being ranked. A use of a trusted set of seed sites may sound a little like the TrustRank approach developed by Stanford and Yahoo a few years ago as described in Combating Web Spam with TrustRank (pdf). I don’t know what role, if any, the Yahoo paper had on the development of the approach in this patent application, but there seems to be some similarities. The new patent is: Producing a ranking for pages using distances in a Web-link graph.”

The theory behind trusted pages is that “good pages seldom point to bad ones.” The patent’s inventor, Nissan Hajaj, has been a Google senior engineer since 2004. See the write-up for the text of the patent, or navigate straight to the U.S. Patent and Trademark Office’s entry on the subject.

 

Cynthia Murrell, December 18, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Textio for Text Analysis

December 17, 2015

I read “Textio, A Startup That Analyzes Text Performance, Raises $8M.” The write up reported:

Textio recognizes more than 60,000 phrases with its predictive technology, Snyder [Textio’s CEO] said, and that data set is changing constantly as it continues to operate. It looks at how words are put together — such as how verb dense a phrase is — and at other syntax-related properties the document may have. All that put together results in a score for the document, based on how likely it is to succeed in whatever the writer set out to do.

The secret sauce bubbles in this passage:

it’s important that it [Textio] feels easy to use — hence the highlighting and dropdown boxes rather than readouts.

Textio’s Web site states:

From e-commerce to real estate to marketing content, Textio was founded on this simple vision: how you write changes who you reach, and using data, we can predict ahead of time how you’re going to do.

The company, according to Crunchbase has received $9.5 million from Emergence Capital Partners and four other firms.

There are a number of companies offering text analysis, but Textio may be one of the few providing user friendly tools to help people write and make sense of the nuances in résumés and similar corpuses. Sophisticated analysis of text is available from a number of vendors.

It is encouraging to me that a sub function of information access is attracting attention as a stand alone service. One of the company’s customers is Microsoft, a firm with home grown text solutions and technologies from Fast Search & Transfer and Powerset, among others sources. Microsoft’s interest in Textio underscores that text processing that works as one hopes is an unmet need.

Stephen E Arnold, December 17, 2015

Need an Open Source Semantic Web Crawler?

December 17, 2015

If you do, the beleaguered Yahoo has some open source goodies for you. Navigate to “Yahoo Open Sources Anthelion Web Crawler for Parsing Structured Data on HTML Pages.” The software, states the write up, is “designed for parsing structured data from HTML pages under an open source license.”

There is a statement I found darned interesting:

“To the best of our knowledge, we are first to introduce the idea of a crawler focusing on semantic data, embedded in HTML pages using markup languages as microdata, microformats or RDFa,” wrote authors Peter Mika and Roi Blanco of Yahoo Labs and Robert Meusel of Germany’s University of Mannheim.

My immediate thought was, “Why don’t these folks take a look at the 2007 patent documents penned by Ramanathan Guha. Those documents present a rather thorough description of a semantic component which hooks into the Google crawlers. Now the Google has not open sourced these systems and methods.

My reaction is, “Yahoo may want to ask the former Yahooligans who are now working at Yahoo how novel the Yahoo approach really is.”

Failing that, Yahoo may want to poke around in the literature, including patent documents, to see which outfits have trundled down the semantic crawling Web thing before. Would it have been more economical and efficient to license the Siderean Software crawler and build on that?

Stephen E Arnold, December 17, 2015

Weekly Watson: The Internet of Things

December 17, 2015

Yep, there is not a buzzword, trend, or wave which IBM’s public relations professionals ignore. I read “IBM Is Bringing Its Watson Supercomputer to IoT.” The headline puzzled me. I thought that Watson was:

  • Open source software like Lucene
  • Home brew scripts
  • Acquired technology.

The hardware part is moving to the cloud. IBM is reveling in a US government supercomputing contract which may involve quantum computing.

But Watson runs on hardware. If Watson is a supercomputer, I see some parallels with the Google and Maxxcat search appliances.

The write up reports:

IBM has announced today it is bringing the power of its Watson supercomputer to the Internet of Things, in a bid to extend the power of cognitive computing to the billions of connected devices, sensors and systems that comprise the IoT.

Will the Watson Internet of Things be located in Manhattan? Nope. I learned:

the company announced that the new initiative, the Watson Internet of Things, will be headquartered in Munich, Germany. The facility will serve as the first European Watson innovation super centre, built to drive collaboration between IBM experts and clients. This will be complemented by eight Watson IoT Client Experience Centers spread across Europe, Asia and the Americas.

Why Germany? IBM has a partner, Siemens.

Will the IoT venture use the shared desk approach. According to EndicottAllilance.org Comment 12/10/15, this approach to work has some consequences:

I wouldn’t get too excited about the new “Agile Workspace” in RTP. Basically it is management forcing workers back to the office and into a tense, continuously monitored environment with no privacy. It will be loud, you’ll have no space of your own, and it will be difficult to think. Mood marbles? Better be sure you always choose the light-colored ones! And make sure your discussion card is always flipped to the green side. What humiliation! The environment will be great for loud-mouthed managers, terrible for workers who do all the work. Worse than cubicles.

From cookbooks to cancer, IBM Watson seems to be where the buzzwords are. I wonder if the Watson revenues will reverse the revenue downturns IBM has experienced for 14 consecutive quarters.

Stephen E Arnold, December 17, 2015

Real Journalists Do Not Know Who Owns Their Newspaper

December 17, 2015

I find “real” journalists and traditional publishers an endless source of amusement. I read “Reporters in Las Vegas Try to Crack Case of Whom Really Owns Their Newspaper.” You may be able to read this article online but you may have to pay or perform some act of obeisance to the the Gray Lady. This write up, which appeared on December 15, 2015, in my dead tree copy on page B-3, without much self awareness of the irony  of the situation, appeared in the New York Times. Yep, that’s the newspaper that Jeff Bezos wants to leapfrog with his almost new Washington Post.

I read the facts, just the facts:

It was not clear precisely who is behind the company [a newspaper called The Review Journal], which was very recently incorporated.

And there are the senior managers of this esteemed outfit:

The paper will continue to be managed by Gatehouse Media, a subsidiary of New Media Investment Group. All but the company’s most senior executives and those most directly involved with the paper are unaware of the new owner’s identity.

If I understand this, someone paid $140 million for a newspaper with real journalists, and the journalists cannot find out who owns the newspaper.

Let’s look on the bright side. Public relations professionals can submit information to this type of real newspaper and expect to get their story nudged forward. It seems like taking the easy path is what some real journalists do.

And the New York Times? I don’t think their “real” journalists could find the answer either. What does this suggest about the ability of “real” journalists to obtain information?

Boy, I don’t know. Perhaps we could ask Rupert Murdoch or some of his professionals who found creative ways to gather information.

I wonder if any writers for comedians are monitoring this story. Where are Jack Benny’s writers when one needs them.

Stephen E Arnold, December 17, 2015

Canada Based Coveo Predicts the Future of Enterprise Search

December 17, 2015

Coveo, an enterprise search vendor once closely focused on Microsoft SharePoint, offered some predictions about 2016 and search.

What does the Québec City based search vendor presage? The answer appears in “Coveo Releases 2016 Enterprise Search Predictions.” Here you go:

  • Machine learning. IBM will definitely support this prediction or their public relations advisors will.
  • Cybersecurity. Yep. This is a hot area. Some of the companies pushing new boundaries in search are outfits like Diffeo, Tiberian, and Recorded Future (backed by the Alphabet Google outfit). Key word search does not sleep in the same king sized bed as these outfits in my opinion.
  • Real time engagement and learning for digital transformation. I thought of Yahoo. The company has several thousand people working on search. How is that working out for the Xoogler running Yahoo?
  • Proactive recommendations. Okay, this is the selective dissemination or SDI thing. SDIs are useful but these cannot be annoying. Does anyone remember Desktop Data? I do.
  • Search makes systems more intelligent. Well, maybe. search is a utility. The more successful of the intelligence automation outfits uses subcomponents of search; for example, entity extraction, metadata generated on the fly, and relationship maps. These are, in my opinion, more important than old school search.

The write up includes a remarkable quote. I have placed this gem in my folder for future reference. Here is the statement I highlighted in bold red marker ink:

2016 will be a pivotal year for enterprise search. Organizations now recognize the strategic value of using intelligent search-based applications to drive more customer and employee engagement.

Based on the information available to me as we complete The Dark Web Dilemma, 2016 is going to be a bit different from 2015 when venture money flowed like water to outfits in the content processing game. My thought about the future is that companies which have ingested oodles of cash have to generate revenue growth, demonstrate a path to profitability, or face a Convera, Delphes, Entopia, or Siderean Software type future.

What’s that type of future?

A sell out or a shut down. My hunch is that the stakeholders in the Coveo play are looking for a buyer just like Lexmark.

Stephen E Arnold, December 17, 2015

Old School Mainframes Still Key to Big Data

December 17, 2015

According to ZDNet, “The Ultimate Answer to the Handling of Big Data: The Mainframe.” Believe it or not, a recent survey of 187 IT pros from Syncsort found the mainframe to be the important to their big data strategy. IBM has even created a Hadoop-capable mainframe. Reporter Ken Hess lists some of the survey’s findings:

*More than two-thirds of respondents (69 percent) ranked the use of the mainframe for performing large-scale transaction processing as very important

*More than two-thirds (67.4 percent) of respondents also pointed to integration with other standalone computing platforms such as Linux, UNIX, or Windows as a key strength of mainframe

*While the majority (79 percent) analyze real-time transactional data from the mainframe with a tool that resides directly on the mainframe, respondents are also turning to platforms such as Splunk (11.8 percent), Hadoop (8.6 percent), and Spark (1.6 percent) to supplement their real-time data analysis […]

*82.9 percent and 83.4 percent of respondents cited security and availability as key strengths of the mainframe, respectively

*In a weighted calculation, respondents ranked security and compliance as their top areas to improve over the next 12 months, followed by CPU usage and related costs and meeting Service Level Agreements (SLAs)

*A separate weighted calculation showed that respondents felt their CIOs would rank all of the same areas in their top three to improve

Hess goes on to note that most of us probably utilize mainframes without thinking about it; whenever we pull cash out of an ATM, for example. The mainframe’s security and scalability remain unequaled, he writes, by any other platform or platform cluster yet devised. He links to a couple of resources besides the Syncsort survey that support this position: a white paper from IBM’s Big Data & Analytics Hub and a report from research firm Forrester.

 

Cynthia Murrell, December 17, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta