When Smart Software Is Really Dense

May 4, 2010

Search and content processing has hidden rooms and musty basements. Many vendors put some apple cider in a pan, turn the burner  on low, and suffuse the manse with a mouth-watering scent. Yep, that’s marketing.

Rest in Peas: The Unrecognized Death of Speech Recognition” is not about cooking veggies. The write up does a good job of pointing out that the progress of taking a spoken string and figuring out what was said is going nowhere fast. The write up includes a number of useful comments, some tables, links to oh-so academic papers, and some blunt talk. One example:

To some, these developments are no surprise. In 1986, Terry Winograd and Fernando Flores audaciously concluded that “computers cannot understand language.” In their book, Understanding Computers and Cognition, the authors argued from biology and philosophy rather than producing a proof like Einstein’s demonstration that nothing can travel faster than light. So not everyone agreed. Bill Gates described it as “a complete horseshit book” shortly after it appeared, but acknowledged that “it has to be read,” a wise amendment given the balance of evidence from the last quarter century. Fortunately, the question of whether computers are subject to fundamental limits doesn’t need to be answered. Progress in conversational speech recognition accuracy has clearly halted and we have abandoned further frontal assaults.

I have tucked this write up in my reference folder. I also discussed the article with my trusty crew at lunch. What emerged from the conversation were three points.

First, speech recognition is just one small, dark, hidden room in search. There are many more. Hopefully Robert Fortner or a person with his knowledge will open the blinds and let the sun shine in.

Second, the write up strikes at the heart of Google’s method. At lunch, I said that Facebook uses humans to generate content and provide relevant results. Nothing could be more different from the math club approach used at Google. The references to algorithms in the write up reminded me that Google had a great idea a decade ago. That idea may be less and less useful in some of the dark rooms of search. Has Facebook found a way to open search doors? I don’t know but it is an interesting question, is it not?

Finally, I have seen other dead end charts. One example that struck me was the lack of progress in precision and recall tests. The US government’s search “competitions” are not in the public relations business. I have heard that improvements in the formal precision and recall scores of some of the participants are * very * hard to improve. We are not talking 90 percent and up. We are talking 75 percent and higher with 90 percent a darned wonderful score. How many years? A decade, give or take a year or two. Another musty room? Maybe.

To wrap up, search is a tough problem. Search is not getting any easier either.

Stephen E Arnold, May 4, 2010

Unsponsored post.

Lucene Solr Developer Event in Prague Arrives

May 4, 2010

Lucid Imagination is hosting a developer event called Apache Lucene EuroCon in Prague form May 18-21. Insiders tell me they have attracted over 120 attendees so far, a real feat in these travel-constrained times. Some reasons might be: rising interest and adoption of Lucene/Solr; the vibrant European developer community, and the gap left due to the cancellation of the 2010 Apache EuroCon.

According to the conference Web site:

Apache Lucene EuroCon 2010 is the first dedicated Lucene and Solr User Conference in Europe. This conference provides professional training on Lucene and Solr as well as a unique opportunity to learn from the search experts in two educational tracks.

The event will include 2-day Lucene and Solr Boot camp trainings, user case studies, and technical deep dives, along with keynotes from Eric Gries, Lucid’s CEO, Stephen Dunn from the Guardian, and Zack Urlocker, previously EVP at MySQL.

Stephen E Arnold, May 4, 2010

No one paid me to write this. Maybe someday!

Facebook Chases Trends

May 3, 2010

When new users are signing up with Facebook, they are taking advantage of the opportunity to learn more about their likes, dislikes and interests. Mashable.com recently reported in their article, “Facebook Suggests Pages to Like for New Users” on this new approach to making connections and enhancing the new user’s first experience with their product. Apparently they are trying to get ahead of search by suggesting brands, people, etc. that you might “like”. As long as they don’t start using it for a cheesy marketing approach to sell something, it really can’t hurt. Finding a page of an organization you are truly interested in, but might not have thought were on Facebook is an added bonus. Key word search which requires turning ideas into search terms takes one more step out of the mainstream.

Melody K. Smith, May 3, 2010

Note:   Post was not sponsored.

Fast Search Server 2010 Vision

May 3, 2010

Another vision for Fast Search. You can read “Vision: Fast Search Server 2010 for SharePoint: Brief Discussion” and jump into the fray. The write up explains that Fast Search often has to be defended. And then the article turns to the azure chip outfit Gartner for support and succor. The passage that I tucked in my “future reference” folder was:

We guess Microsoft realized it didn’t have a real solution for the high-end enterprise search market and that that led them to the acquisition of the Norwegian company Fast, which specializes in delivering search services for the high-end market. According to Gartner, Fast is even more: it’s the best enterprise search system in the world. The acquisition was announced in January 2008 and ultimately, in SharePoint 2010, this has lead to a fully SharePoint-integrated solution called Fast Search Server 2010 for SharePoint. Fast Search Server 2010 for SharePoint is an add-on, which means you can start using the normal SharePoint Enterprise Search features as long as you want and switch to Fast once you need high-end search features.

The article wraps up with a number of bullet points about the wonders of the Fast Search system.

Vision? Not exactly. If you are a SharePoint fan and need some cheerleading, the “Vision” write up may be your cup of tea. Not much complexity in the write up, but there may be some in the Fast Search system in my opinion.

Stephen E Arnold, May 3, 2010

Unsponsored post.

Search as Oil Slick or Volcanic Ash

May 3, 2010

I had a conversation with a person familiar with enterprise search. In the course of the ebb and flow, a metaphor surfaced, and I wanted to capture it before it slipped away.

The idea is that an environmental event or a human action can trigger big consequences. Anyone trying to get from Europe on April 16, 2010, learned quickly about ash plumes. Now the unlucky residents of the US Gulf Coast have an opportunity to understand the diffusion pattern of an oil release.

What’s this have to do with search?

The idea which struck me as interesting is that search is now having a similar impact on activities, processes, and ecosystems far removed from ground zero. I am not able to accomplish much of my “work” unless I can locate the program, file, information, and data I need. I don’t really do anything with physical objects. I live in a world of data and constructs built upon information. Sure, I have a computer and keyboard, and without those hardware gizmos, I would be dead in the water or maybe a sea of red ink?

VolcanicAsh

The search eruption. Source: http://www.liv.ac.uk/science_eng_images/earth/research/VolcanicAsh.jpg

Search is now disappearing in some organizations, absorbed into other applications. One way to describe this shift is to use the phrase “search enable application”. Another approach is to talk about search as a utility or an embedded service.

Read more

Attensity Acquires Biz360

May 3, 2010

Attensity caught our attention with its no-cash mergers with some outfits in Germany. Now Attensity makes a more fungible move. Apparently data analysis and social media monitoring makes good bedfellows. This particular acquisition provides for even deeper analysis. VentureBeat reports on the new acquisition in their article, “Attensity Picks Up Biz360 for Enhanced Social Media Monitoring”  Biz360 was a pioneer in the social monitoring world, focusing on not just collecting data but true analysis of what it means, and together with Attensity they plan to provide an even deeper analysis to their customers. This acquisition also provides the opportunity to not only monitor social conversations, but to engage and react to those conversations in real-time. New features are being promised later this summer. This continues the trend in growing social media monitoring companies and their appeal to investors.

Melody K. Smith, May 3, 2010, 2010

Note:   Post was not sponsored.

TEMIS and Endeca Form Enterprise Alliance

May 3, 2010

On Tuesday, April 27, 2010, a new partnership was launched between TEMIS, a leader in Text Analytics, and Endeca, a leading provider of search applications. TEMIS is being brought on board the Endeca Extend partner program for its ability to enrich the customer’s search experience through pre-built integrations. In TEMIS.com’s press release, the potency of the venture is explained as follows, “Integration of [TEMIS’] Luxid and Endeca empowers customers to take advantage of unstructured data within their organization to create unsurpassed user experiences.” It may sound like a lofty promise, but the fit of the two companies on paper is nearly seamless. Combining Endeca’s more than 250,000,000 end users accessing data across the globe, with Temis’ strength of Guided Navigation, landing pages and improved search relevancy, has the makings of an ideal fit. The real payoff for this partnership will be how well TEMIS and Endeca can differentiate themselves as the array of semantic features like Topics, Categories or Domain Knowledge are refined.

David Hardt, May 3, 2010

Unsponsored post.

The Courier Journal and Winning Horse Races

May 2, 2010

Post-Derby day. Sunday newspaper day. Depressing, and it is only 9 am.

A near miss in New York City excited the NPR news team this morning (May 2, 2010). Nary a word about Greece, Spain, and Portugal, however. To get the details, I had to fire up my laptop and check out online news sources.

I walked to the end of my driveway to retrieve the Courier-Journal, where I used to work. I also picked up my home delivery copy of the New York Times. The NYT was wet because the blue plastic bag was not closed, so water happily nestled in the newsprint. I could tell at a glance that the NYT closed before the problem in Times Square was news. I tossed the paper aside to dry.

The C-J was the ad section and the soft features. No front page. What was delivered dripped water on the kitchen floor. My wife told me to sort the newspaper in the garage. Fun. The Derby was yesterday and I was curious about the coverage of the event. Despite the nose dive in the original content in the C-J over the last 20 years, reporters do hoof and gallop around the Derby in search of “stories”. Well, mostly it is “who got rich,” “who showed up”, and “who got in trouble”. No joy. A call to the C-J’s hotline triggered a recording that told me there were production problems with the Sunday paper. No big deal. There’s online, Twitter, and Facebook. The story was online here “Production Problems Prevent Delivery of Full Sunday Courier Journal.” I wonder if there were cutbacks and efficiencies applied that made one of the highest circulation editions of the year fail? Like aircraft maintenance, no one knows what shifts have been made until the toilets don’t work, the flights can’t leave the gate, or the pilot reports a “slight issue and some paperwork”.

The one section of the C-J that showed up is called “Forum”, and what do you know? The front page of section H for Sunday, May 2, 2010, ran a story with this headline: “Rethink: Newspapers are better off than you may think.” The author is a fellow named Arnold Garson, whom I don’t know. His picture shows a kindly visage in dark suit with red tie and the slug: “The Courier Journal remains a strong and credible local news provider and a profitable business today.”

Since my Sunday paper was missing the front page, the sports section, and some other bits, I am not on board with the assertion about “a strong and credible local news provider.” I think the “profitable business” part is really the point.

I read the article, which purports to be the text of a speed delivered on April 7, 2010, to the Downtown Kiwanis Club meeting. The article is a long piece, running about 80 column inches. If Mr. Garson read this speech, I am delighted I was not in attendance.

Summarizing the talk is easy: C-J makes money, reaches more than 85 percent of the readers, and makes money. Oh, I repeated myself. Sorry, but that point jumps out a couple of times in the text of the talk.

I noted some other highlights as well:

  • The C-J is performing better than other newspapers; that is, “less bad” is “good”
  • Delivery of the hard copy to “outlying areas” has been trimmed
  • Ad rates and subscription prices are going up
  • TV news viewers are older than C-J newspaper readers
  • A 100 million people read newspapers.

You get the idea.

The C-J’s local Web site attracts 1.3 million unique users per month and generates 16 million page views. The C-J has achieved 380,000 mobile impressions per month. That’s good. The questions I had were:

  • What’s a “unique”? What’s a page view” What’s a mobile impression?
  • How does this compare with Facebook’s 400 million users in early 2010, up from 150 million in early 2009?
  • What’s the relationship between circulation decline and uptake of the C-J’s Web site?

I could crank out more questions, but I want to jump to the wrap up of the talk. This is the assertion I find most interesting:

Ninety-nine percent of the nation’s newspapers, including The Courier-Journal will survive this recession  based on our own core strengths, our determination to transform our business model and through the lift we will get from the recovery itself.

I am not sure how to make the leap from 99 percent survivability to “our own core strengths”. The core strengths seem to be advertising. I am not convinced the C-J does much local news. I understand determination. The assertion about the recovery seems to be a “maybe” argument. But it is tough to get coverage of the European financial crisis based on my reading the C-J every day. I have to turn elsewhere for that information.

Why do I think the talk is baloney? First, I fund the Seed2020 meet ups for women- and minority-owned businesses. I know that none of the more than 20 companies featured in the meet ups since November 2009 have been covered in the C-J. A couple of these businesses are real stories with solid news value. Nope. No coverage. One can argue that the weakening Business First, American Cities Business Journal publication is taking up the slack. Nope. The Seed2020 events show that there are solid news stories that are just not covered. I find the C-J argument on ground as muddy as the race track yesterday.

Second, without the C-J’s front page or the coverage of the NYT event in Times Square, I question the value of the newspaper as a timely source of information. Traditional deadlines and production problems underscore the irrelevance of the “business model” that will keep 99 percent of the newspapers in business. Mr. Garson does not provide any reference points for the number of newspapers in business in 1900, the number in 2000, and the number today. I do touch on this issue in Google: The Digital Gutenberg and won’t repeat the decline, consolidation, and homogeneity referenced in my monograph.

Third, the folks I know who are 55 and younger are not into newspapers. I watched how my son’s friends, now in their 30s, looked at the sports pages and their iPads and Macbooks. They talked to one another, chatted on their mobile devices, and sent text messages. This behavior took place as we sat at the kitchen table. The newspaper was marginalized.

Bottom-line: Timeliness, medium, and business model are intermingling with the DNA of people who don’t find the hard copy newspaper relevant. The C-J’s Arnold Garson is putting a positive spin on a reality that does not exist in our household.

Of course, I live in one of those outlying areas in Kentucky. I can log on to Newsnow.co.uk and learn about Europe. I can check Craigslist.com for ads. I can scan my Twitter stream to learn about the horrific accident that took place at Highway 42 and Highway 841 at 6 45 am.

No C-J needed for that. And I used to work there. Big changes to which the C-J and papers like the NYT are struggling to adapt. Like the long shots in the Derby yesterday, only one horse won. In my opinion, the C-J and the NYT are both entering the media race next year with long odds. Just my opinion and it is as valuable as a tip at the track.

Stephen E Arnold, May 2, 2010

Unsponsored post.

Designed to Sink: Aardvark and the Google Life Preserver

May 2, 2010

Aardvark’s owners referred to the company as “a sinking ship from day one” and they wouldn’t have had it any other way. A Gigaom article, “The Aardvark Theory of Product: Fake it Until you Make it,”  details this startup’s shaky rise from directionless startup to its recent $50 million Google acquisition. The story is troubling because the Aardvark brain trust began without a product in mind, basically digitally tossing darts against a wall. The company tried dozens of different products until its social search program caught on and was promptly snapped up. We got the feeling this entire company was born to simply sell something, anything to Google or its competition. Are search products a failure if Google buys them?

Patrick Roland, May 2, 2010

Unsponsored post.

Filtering Travel Data Big Business

May 2, 2010

Sites like TripAdvisor have been in business for awhile, but some smaller players like open standard Kayak.com are gaining ground reports “Game-Changer for 2009: Trip Planning Web Sites”. From professionally-written sites like TripWolf recommendation-providing Uptake.com, social search and “mountains of data” are being integrated to create personalized user experiences for your travel choices. There’s also Triporati, which profiles users and crawls the other big travel sites for recommendations. Many of these sites include their own social network-style interactions, or use Facebook Connect.  Businesses need to realize that these sites are going to “actively redefine customer expectations” and consequently get their hotel or airline bubbled to the top.

Samuel Hartman, May 2, 2010

Note: Post not sponsored.

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta