Social Network Data and the Future of Research Bibliographies

May 15, 2010

A two for one link. Navigate to HubSpot’s Inbound Internet Marketing Blog and “The Ultimate List: 300+ Social Media Statistics.”. Useful data collection. Well done. On the downside, this article helped me see the future of bibliographies. Years ago I studied with JJ Campbell, one of the editors of the Chaucer corpus. We made lists. The new bibliography happily romps across images, videos, and supporting links. JJ would probably have objected if I submitted my work as multimedia.

Stephen E Arnold, May 15, 2010

Endeca Moves toward Video Search

April 22, 2010

I am putting the finishing touches on Google Beyond Text and came across a news release from Endeca with the catchy title “Endeca Extend Partner Program Adds Leading Video Search Software Vendors”. I was intrigued and partly because I could not figure out the “extend” and “video search” notions. The idea seems to be a good one. With interest in non text content drifting upwards, Endeca is taking steps to allow its McKinley search platform to process video objects. According to the release:

Inaugural Endeca Extend partners in the video search category include 3Play Media, Brightcove and Nexidia. The majority of video and audio files do not have highly attributed meta-data surrounding them. However, through the Endeca Extend program, Endeca and its partners allow customers to use extracted meta-data and high quality, time-synchronized transcripts to increase search recall for audio and video content, and provide new facets for Guided Navigation, cluster related topics, offer landing pages, and improve search relevancy. Endeca customers can easily run their data through an Endeca Extend partner solution, extract additional meta-data elements or transcripts from the most common audio and video file formats and append that information to the original content. Through the partner solutions, search and navigation results will also offer segment-specific playback capabilities for audio and video content. This lowers the integration costs and adds significant structure to the content to enhance the overall user experience. The pre-built integrations allow joint customers the ability to implement best-of-breed technologies without sacrificing ease of integration.

Will Endeca gain traction in the fiercely competitive video search sector? Many organizations put their videos on YouTube and link to them. The pointers and description of the video are text descriptions of the videos. The SEO crowd is chattering about the usefulness of videos and descriptions of them in a Google PageRank effort. We are not too sure about the SEO angle, but we know video is hot for the under 25 crowd.

In our experience, talking about integration of video content and implementing video search can be one of those management tasks where slips between cup and lip can occur. More information is available directly from Endeca at www.endeca.com.

Stephen E Arnold, April 22, 2010

Unsponsored post.

The Seven Forms of Mass Media

April 21, 2010

Last evening on a pleasant boat ride on the Adriatic, a number of young computer scientists to be were asking about my Google lecture. A few challenged me, but most seemed to agree with my assertion that Google has a large number of balls in the air. A talented juggler, of course, can deal with five or six balls. The average juggler may struggle to keep two or three in sync.

One of the students shifted the subject to search and “findability.” As you know, I floated the idea that search and content processing is morphing into operational intelligence, preferably real-time operational intelligence, not the somewhat stuffy method of banging two or three words into a search box and taking the most likely hit as the answer.

The question put to me was, “Search has not kept up with printed text, which has been around since the 1500s, maybe earlier. What are we going to do about mobile media?”

The idea is that we still have a difficult time locating the precise segment of text or datum. With mobile devices placing restraints on interface, fostering new types of content like short text messages, and producing an increasing flow of pictures and video, finding is harder not easier.

I remembered reading “Cell Phones: The Seventh Mass Media” and had a copy of this document on my laptop. I did not give the assertion that mobile derives were a mass medium, but I thought the insight had relevance. Mobile information comes with some interesting characteristics. These include:

  • The potential for metadata derived from the user’s mobile number, location, call history, etc
  • The index terms in content, if the system can parse information objects or unwrap text in an image or video such as converting an image to ASCII and then indexing the name of a restaurant or other message in an object
  • Contextual information, if available, related to content, identified entities, recipients of messages, etc.
  • Log file processing for any other cues about the user, recipient(s), and information objects.

What this line of thinking indicates is that a shift to mobile devices has the potential for increasing the amount of metadata about information objects. A “tweet”, for instance, may be brief but one could given the right processing system impart considerable richness to the information object in the form of metadata of one sort or another.

The previous six forms of media—[I] print (books, magazines, and newspapers), [II] recordings; [III] cinema; [IV] radio; [V] television; and [VI] Internet—fit neatly under the umbrella of [VII] mobile. The idea is mobile embraces the other six. This type of reasoning is quite useful because it gathers some disparate items and adds some handles and knobs to the otherwise unwieldy assortment in the collection.

In the write up referenced above, I found this passage interesting: “Mobile is as different from the Internet as TV is from the radio.”

The challenge that is kicked to the side of the information highway is, “How does one find needed information in this seventh mass media?” Not very well in my experience. In fact, finding and accessing information is clumsy for textual information. After 500 years, the basic approach of hunting, Easter egg style, has been facilitated by information retrieval systems. But I think most people who look for information can point out some obvious deficiencies. For example, most retrieval systems ignore content in various languages. Real time information is more of a marketing ploy than a useful means of figuring out the pulse count for a particular concept. A comprehensive search remains a job for a specialist who would be recognized by an archivist who worked in Ephesus’ library 2500 years ago.

barokas video

Are you able to locate this video on Ustream or any other video search system? I could not, but I know the video exists. Here is a screen capture. Finding mobile content can be next to impossible in my opinion.

When I toss in the radio and other rich media content, finding and accessing pose enormous challenges to a researcher and a casual user alike. In my keynote speech on April 15, 2010, I referenced some Google patent documents. The clutch of disclosures provide some evidence that Google wants to apply smart software to the editorial job of creating personalized rich media program guides. The approach strikes me as an extension of other personalization approaches, and I am not convinced that explicit personalization is a method that will crack the problem of finding information in the seventh medium or any other for that matter.

Here’s my reasoning:

  • Search and retrieval methods for text don’t solve problems. The more information processed means longer results lists and an increase in the work required to figure out where the answer is.
  • Smart systems like Google’s or the Cuil Cpedia project are in their infancy. An expert may find fault with smart software that is actually quite stupid from the informed user’s point of view.
  • Making use of context is a challenging problem for research scientists but asking one’s “friends” may be the simplest, most economical, and widely used method. Facebook’s utility as a finding system or Twitter’s vibrating mesh may be the killer app for finding content from mobile devices.
  • As impressive as Google’s achievements have been in the last 11 years, the approach remains largely a modernization of search systems from the 1970s. A new direction may be needed.

The bright young PhDs have the job of figuring out if mobile is indeed the seventh medium. The group with which I was talking or similar engineers elsewhere have the job of cracking the findability problem for the seventh medium. My hope is that on the road to solving the problem of the new seventh medium’s search challenge, a solution to finding information in the other six is discovered as well.

The interest in my use of the phrase “operational intelligence” tells me one thing. Search is a devalued and somewhat tired bit of jargon. Unfortunately substituting operational intelligence for the word search does not address the problem of delivering the right information when it is needed in a form that the user can easily apprehend and use.

There’s work to be done. A lot of work in my opinion.

Stephen E Arnold, April 20, 2010

No sponsor for this post, gentle reader.

IBM and Its New Content Delivery Initiative

April 20, 2010

I listened to a talk by an IBM innovator today and I did not hear anything about IBM’s deal with Verizon to float cloud storage nor did I hear anything about IBM’s content delivery initiative. I was puzzled because before the lecture I read “IBM Helps Media and Entertainment Industry Meet the Challenges of Delivering Content in the Digital Age.” My hunch is that IBM executives don’t know too much about other units of IBM. The same problem exists at Google, Microsoft, and other multi billion dollar companies. Read this article. Do you see similarities between what Google has announced and Google’s suite of content delivery patent documents? These jumped right off the page and hit me between the eyes. I wonder if the similarity is a result of my having been immersed in Google patent documents and technical papers for the last nine months or if there was one of those happy coincidences that occur. Remember the calculus dust up?

IBM asserts that it offers media and entertainment companies a way to make their lives much easier. Among the features of the new system are:

  • A media enterprise framework. Unlike the repackaging of open source Apache into WebSphere, this framework sounds like a home grown solution
  • Personalized content delivery, quite similar to the Google personalization method for set top boxes and other devices
  • Business process features; that is, everything hooks together presumably eliminating stand alone and siloized functioned
  • Metadata management which makes search of content assets possible
  • Security

IBM in this article suggested to the writer:

The IBM Media Enterprise Framework is the software technology backbone that makes a wide range of media and entertainment solutions possible by helping clients to build an integrated platform for all of their operations based on industry standards. This new framework utilizes elements of IBM’s entire software portfolio including WebSphere, Rational, Tivoli, Lotus and Information Management products while leveraging the full range of IBM server and storage products and the industry-specific offerings and consulting expertise of IBM Global Business Services. Additionally, it supports the broad set of independent Software Vendors that address specific application requirements.

Okay, frameworks and backbones. Is IBM, like Google, arriving late to the content delivery party? Akamai and lots of other companies are in this space. Margins seem to be under pressure as firms vie for available accounts. Apple, despite its walled garden approach, seems to be chugging alone. Google’s YouTube.com delivers lots of video. Is there a play for IBM?

We will know if IBM breaks out revenues for this new framework / backbone. My hunch is that IBM is scrambling for any new revenue opportunity it can get. The company has lots of competitors and Fortune 1000 long accustomed to paying IBM big bucks or euros may be counting pennies.

My view is that IBM is cobbling together pieces, partners, and promises in hopes of striking a gusher of cash. Maybe content delivery is another commodity and not exactly what it seems to IBM’s business analysts? And what about search? Maybe another open source play?

Stephen E Arnold, April 20, 2010

A freebie.

eBook Sales to Grow

April 15, 2010

In a report from Goldman Sachs, analysts predicted growth in book sales. “U.S. Book Sales to Increase on E-Books, Goldman Says” included this statement: “Apple’s share of the e-book market will surge to 33 percent in 2015 from 10 percent this year.” Amazon, it seems, will see its share of e-book sales decline to 28 percent from 50 percent. Will e-books remain books, or will e-books morph into interactive media? Will authors of books be able to create products that will appeal to users of new devices like the Apple iPad? If publishers have to invest in software development, will increased costs of production put further pressure on author royalties?

Stephen E Arnold, April 15, 2010

Unsponsored post.

Google and Disruption: Will It Work Tomorrow?

April 15, 2010

Editor’s Note: The text in this article is derived from the notes prepared by Stephen E Arnold’s keynote talk on April 15, 2010. He delivered this speech as part of Slovenian Information Days in Portoroz, Slovenia.

Thank you, Mr. Chairman. I am most grateful for the opportunity to address this group and offer some observations about Google and its disruptive tactics.

I started tracking Google’s technical inventions in 2002. A client, now out of business, asked me to indicate if “Google really had something solid.”

My analysis showed a platform diagram and a list of markets that Google was likely to disrupt. I captured three ideas in my 2005 monograph “The Google Legacy“, which is still timely and available from Infonortics Ltd. in Tetbury, Glos.

The three ideas were:

First, Google had figured out how to add computing capacity, including storage, using mostly commodity hardware. I estimated the cost in 2002 dollars as about one-third what companies like Excite, Lycos, Microsoft, and Yahoo and were paying.

Second, Google had solved the problem of text search for content on Web pages. Google’s engineers were using that infrastructure to deliver other types of services. In 2002, there were rumors that Google was experimenting with services that ranged from email to an online community / messaging system. One person, whose name I have forgotten, pointed out that Google’s internal network MOMA was the test bed for this type of service.

Third, Google was not an invention company. Google was an applied research company. The firm’s engineers, some of whom came from Sun Microsystems and AltaVista.com, were adepts at plucking discoveries from university research computing tests and hooking them into systems that were improvements on what most companies used for their applications. The genius was focus and selection and integration.

image

Google is an information factory, a digital Rouge River construct. Raw materials enter at one end and higher value information products and services come out at the other end of the process.

In my  second Google monograph, funded funded in part by another client, I built upon my research into technology and summarized Google’s patent activities between 2004 and mid 2007. Google Version 2.0: The Calculating Predator, also published by Infonortics Ltd., disclosed several interesting facts about the company.

Read more

Autonomy Amps Social and Rich Media

April 14, 2010

At the National Association of Broadcasters conference, Autonomy announced enhancements to the Virage MediaBin platform. The latest version of MediaBin “automatically forms a conceptual understanding of all rich media assets located in any internal or external repository, including social media, blogs, and videos.” Autonomy’s “meaning based computing” makes sense out of non text content. The firm said:

Autonomy Virage’s solution overcomes these challenges by enabling businesses to automatically understand the value of all digital assets created both inside and outside an organization, and dynamically deliver the right content to the right customer, every time. At the core of Autonomy’s Virage MediaBin platform is the Intelligent Data Operating Layer (IDOL) which allows businesses to automate the processing of all rich media assets. IDOL forms a conceptual understanding that allows marketers to automatically tag and classify any rich media asset, regardless of format or language. Virage MediaBin applies this intelligence to deliver advanced analytics, automatic categorization, summarization, concept clouds, dynamic content associations, content hyperlinking and automation of business processes and workflow.

In addition, the new release:

provides enhanced innovations to “watch and listen” to video. The product automatically converts video to text and time synchronizes with a preview of the content. Video assets can be quickly and easily found with pinpoint accuracy to the exact location within a video where a word or phrase is spoken. This is dynamically associated with other critical digital assets.

You can get more information about MediaBin at www.autonomy.com/dam.

Stephen E Arnold, April 14, 2010

Unsponsored post.

If Books Become Apps, What about Regular Reading?

April 13, 2010

My view is that the era of text is drawing to a close. Words won’t go away, but the future is video and interactivity. Even InDesign CS5 allows moving stuff to be inserted into text. Text is becoming a sidebar. The real action is jiggly wiggly content. You can get a glimpse of the future in “The Amazing Media Habits Of 8-18 Year Olds.” The article is based on a study funded by the Kaiser Family Foundation and you can view the PowerPoint highlights on the Business Insider Web site. For me, the most interesting item in the Kaiser study is summarized in the screenshot below:

kaiser 05

If books become multimedia, then book consumption may go up. The reason is that books will be more like games or TV, two popular pastimes for the sub 18 year old set. The problem facing any traditional publisher is shown in the slide below:

kaiser 01

Looks like the sub 18 crowd is moving beyond text. Apple may be better positioned that either Amazon or Google in this sector. Which horse will win the rich media derby? A favorite or a contender off the radar at the moment?

Stephen E Arnold, April 13, 2010

Google Strengthens Visual Search Team with Plink

April 12, 2010

The fastest way to get technology and staff is to buy a company. Google, according to AFP, grabbed UK based Plink, a search system for “artwork” and pictures. You can read “Google Buys Visual Search Start-Up Plink” and get some basics. (Yahoo News links often go dead, so you may have to do some hunting for the story.) I am at an undisclosed location east of Italy and I don’t have my full Overflight service available. Some basics:

The company will keep its mobile app, but:

we won’t be updating the app and will instead focus our development efforts on Google Goggles, so you’ll see new functionality appearing there in the future.

Google has a clutch of patent documents and technical papers that address visual search, image recognition, and video segmenting. Google is moving beyond text.

Stephen E Arnold, April 12, 2010

A freebie.

The Importance of YouTube.com

April 6, 2010

Seeking Alpha’s “YouTube Much More Important Than Gmail for Google” reminded me that I don’t place sufficient emphasis on YouTube.com, Google’s controversial rich media service. I know that Google has a number of initiatives in rich media, including its recent acquisition of Episodic. The main point of the Seeking Alpha story struck me as:

For YouTube, we estimate that revenue per 1,000 page views increased from about 40 cents in 2005 to about $2.40 in 2009. We expect YouTube’s revenue per 1,000 page views to increase to nearly $10 by the end of the Trefis forecast period.

Google will have to continue its efforts to take advantage of the YouTube.com revenue opportunity. If text ads begin to deteriorate in the face of advertisers jumping to Facebook.com, Google may have to hurry its efforts to pump up YouTube.com’s financial performance.

The Seeking Alpha charts present some tasty charts, but I wonder, “Is time running out for Google in rich media?” The problem is not the iPad, which may or may not be a factor for Google. The challenge is the many different issues that Google now faces. These range from the interesting Viacom legal matter to Google’s role as a champion of uncensored Internet results. Toss in the continued interest in Facebook and the softness in certain economic data. With many complexities interacting, the uncertainty for Google may be at its highest point in the firm’s 11 year history.

YouTube.com is important, and rich media will be the making or breaking of some companies in the online space. Google wants to be on the upside of this shift from text to video, from keyword search to social information acquisition.

Stephen E Arnold, April 5, 2010

This post talks about law and international affairs. Which entity has oversight of uncompensated write ups? I will report non payment to the manager of the Northern Regional Research Lab, south of Chicago.

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta