Text Analytics SummitPolySpot: Agile Enterprise Search Infrastructure

Punchfork: Recipe Search Becomes Popularity Game

September 9, 2011

It’s surprising that we’ve made it to 2011 without anyone doing this before now. Forbes shares the story “Punchfork Innovates Recipe Web Search Business Model with API & Social Data.”

They have taken the best morsels from what seems like millions of Julia Child inspired blogs and arranged them all in one comprehensive website. They’re called Punchfork.

Instead of relying on advertising for funds, they offer paid access to their API, which allows developers to use recipes from Punchfork’s publishers on their own website.

Unlike Yummly and Google, which collect recipes through semantics and algorithms, this data aggregate is possible through the social media websites we all frequent.

Forbes provided us with this nugget of information:

Punchfork founder Jeff Miller aggregates the recipe mentions of participating recipe publishers from across Twitter, Facebook, and Stumbleupon, and then rates recipes according to the number of shares… the number of times a recipe is shared across social networks as “social proof” that a given recipe is worth trying.

In a sense, Punchfork has democratized online recipes. Your voter registration is your login information to Facebook and your power lies within the share button.

This may just very well revolutionize the way people search for recipes—or keep bloggers without friends at the bottom of the totem pole. Search is being consumerized just like Twinkies.

Megan Feil, September 9, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

More about SharePoint and Collaboration

September 9, 2011

Mass-produced web sites are usually ignored by our content wranglers, but this one slipped through the cracks from the questionable web site u4cash.com: “Enjoy Maximum Collaboration with the Help of SharePoint.” The opening paragraph reads like an advertisement for SharePoint and how it can offer custom software application development services.

SharePoint application development is an answer to those who were facing problem with their business requirements. Many people want assistance in the collection and the collaboration of the information from a certain firm or company. What SharePoint applications do is the customization, configuration and the development of Intranet, Extranet and the portals of information that are present on SharePoint.

The article explains that special features SharePoint offers users, such as document libraries to manage content, report generation, easy location, and sharing content across a wide network. Despite the articles mass-produced quality, we got to thinking about collaboration vs. content governance/rick indexing. Is the need to connect users and all of their content, greater than controlling quality?

Our experience tells us that an abundance of users always creates different methods of indexing and organization. Today the “content confusion” is often described as a “governance problem”. We prefer the term editorial policy. Without an editorial policy, uncontrolled content inputs produces what we call a big mess. SharePoint needs to combine indexing and collaboration with a large focus on content findability enhancement.

To tame your SharePoint content findability challenges, we suggest you navigate to SurfRay.com and check out the Ontolica search and content processing system. You will still be well served  by an editorial policy. But whether you have a full blown governance system or a basic SharePoint installation, SurfRay can help.

Stephen E Arnold, September 9, 2011

SurfRay

Google Seems to Be into Real Publishing

September 9, 2011

A couple of years ago I wrote my final monograph for Infonortics Ltd., a publisher which is based in the UK. Frankly I was tired of Google and wanted to do some poking into more interesting topics. I had material for several chapters about Google’s aspirations to be a force in rich media, which is where “real” publishing seems to be going. I took my cue from Google when it started to cut back on white papers and old fashioned text for the breezy, often content light videos about things Google. I chopped the video stuff from Google: The Digital Gutenberg and focused on content. Much in that three year old monograph is still relevant. I think Infonortics is still selling the monograph, but royalties have stopped flowing. Maybe Infonortics has quietly shut down. Who knows? Last Web address I had was www.infonortics.com.

The point is that Google is now a publisher. I assume from my redoubt in a land much enamored of cabbage that “Google Buys Zagat — Restaurant Ratings To Bolster Yelp Killer And Groupon Killer” is accurate. What strikes me is that Google is catering (no pun intended) to those who eat out. In some families, eating out is not an every day event. But Google needs reviews and other types of content. Publishers may hit pay dirt if the Google acquisition machine pulls in their parking lot.

What caught my attention was not the purchase of Zagat. I wrote about Google as a next generation publisher years ago. Nope. The novel point in the write up for me is the use of the word “killer”. Most services don’t kill other services. My hunch is that the deal space is struggling with or without Google and its doppelganger Amazon. I also think the ratings sector is going to find some bare spots in the ski run.

If Google does get into the “killing” business, I think that even docile, charitable attorneys will have to think new thoughts about Google as the next best thing to a go round with Cornelius Vanderbilt about steamship and railroad rights. My view is that hooking “Google” and “kill” may do some semantic damage to the Google.

But traffic is important, so maybe the purchase of a long in the tooth, quirky source of food guidance for the person with a job or solid billability is an attention grabber. My view is that this is interesting but not significant news. For more on the “digital Gutenberg”, chase down a copy of my monograph from 2008.

Stephen E Arnold, September 9, 2011

Sponsored by Pandia.com, a publisher still in business

OpenText: Search to Teaching Is Not the Deal about Selling Services?

September 8, 2011

Another data management, search, collaboration vendor does the “we are in a new business” quick step. Searching with the Stars could be a TV sensation because there are more twists, dips, and slides in the search and content ballroom than in an Arthur Murray Advanced Beginners’ class.

Navigate to “Open Text acquires Peterborough’s Operitel”. The news is that one Canadian firm snapped up another Canadian outfit. What makes this interesting is that I was able to see some weak-force synergy between Nstein (sort of indexing and sort of data management) and OpenText, owner of lots of search, content processing, and collaboration stuff plus an SGML database and the BASIS system. But the Operitel buy has me doing some speculative thinking.

Here’s the passage which caught my attention:

Operitel’s flagship LearnFlex product is built on Microsoft Corp.’s .NET platform and is a top tier e-learning reseller for the Windows maker. Open Text also has a long standing partnership with Redmond, Wash.-based Microsoft.

I see more Microsoft credibility and a different way to sell services. OpenText strikes me as a company with a loosely or mostly non integrated line up of products. The future looks to be charging into the SharePoint sector, riding a horse called “eLearning.”

In today’s business climate, organic growth seems to be tough to achieve even with RedDot and a fruit basket filled with other technologies. (What happened to OpenText’s collaboration product? What happened to the legal workflow business? I just don’t know.) So how does a company which some Canadians at Industry Canada see as one of the country’s most important software companies grow? Here’s the answer:

Open Text’s growth-by-acquisition strategy has recently won accolades among the analyst community. The company purchased Maryland-based Metastorm Inc. for US$182-million, Texas-based Global 360 Holding Corp. for US$260-million and U.K.-based WeComm Ltd. for an undisclosed amount all in the past six months.

My hunch is that OpenText may want to find a buyer. Acquisitions seem to be a heck of a lot easier to complete than landing a major new account. I am not the only person thinking that the business of OpenText is cashing out. Point your browser at “Amid Takeover Fever, Open Text Looks Like a Bargain.” Here’s a key point in my opinion:

Open Text shares have climbed about 20 per cent this year, an increase that would pale in comparison to what would happen if a potential buyer emerged offering a premium similar to what HP has given Autonomy.

So we see a big payday for Autonomy has triggered a sympathetic response at the Globe & Mail, among “analysts”, and I am pretty sure among some OpenText stakeholders.

Several observations:

First, bankers think mostly about their commissions and fees. Bankers don’t think so much about other aspects of a deal. If there is a buck to be made from a company with a burlap sack of individual, solutions, and services, the bankers will go for it. Owning a new Porsche takes the edge off the winter.

Second, competitors have learned that other companies are a far greater threat than OpenText. A services firm can snag some revenue, but other vendors have been winning the big deals. The OpenText strategy has not generated the top line revenue growth and profit that a handful of other companies in search and content processing have achieved. So the roll up and services play looks like a way to add some zip to the burlap bag’s contents.

Third,  customers have learned that OpenText does not move with the agility of some other firms. I would not use the word “glacial,” but “stately” seems appropriate. If you know someone with the RedDot system, you may be able to relate to the notion of rapid bug fixes and point releases. By the way, RedDot used to install an Autonomy stub as the default search engine. I find this interesting because OpenText owns BRS search, Fulcrum (yikes!), and the original Tim Bray SGML data management and search system. (Has SGML and XML come and gone?)

I am not willing to go out on a limb about a potential sale of OpenText, but I think that the notion of eLearning is interesting. Will OpenText shift its focus back to collaboration and document management much as Coveo flipped from search to mobile search to customer support and then back to search again. Canadian search and content processing vendors are interesting. Strike up the music. Another fast dance is beginning. Grab your partner. Search to services up next.

Stephen E Arnold, September 9, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Enterprise Search: Half Still Annoyed

September 8, 2011

A couple of years ago, Martin White and I wrote Successful Enterprise Search Management. We spoke with various colleagues, reviewed the outputs of consulting firms, and we even listened to some of the experts on the conference circuit. Martin and I reported that somewhere between half and two thirds of users of enterprise search systems were dissatisfied with those systems. Our work was echoed by a number of MBAs, a few failed webmasters, and at least one New York City consulting firm who uses this blog to train newly hired “search experts.”

If you don’t believe Martin and me, you can read “Continues to Disappoint and Frustrate Users.” Here’s the key passage:

In a new study from MindMetre Research and sponsored by SmartLogic finds that over 52 percent of users report that they typically are unable to find what they are searching for using enterprise search within an acceptable amount of time.   The report found that most users could accept a search taking two minutes, but searches which exceed four minutes are considered unacceptable by most users.  Only 48 percent of respondents that use enterprise search said that they typically could perform searches under two minutes.

Set aside  the “real” consultants and the trade associations which are floundering for traction. The data point to a “findability” challenge.

image

Enterprise search can produce some fascinating situations. “Real” consultants, poobahs, and unemployed art history majors can ruin a perfectly fine work experience with their inability to antiicpate consequences. A birthday without informed guidance can be quite interesting. Image source: http://www.flickr.com/photos/design-dog/4357801313/

I quite enjoy sitting in rural Kentucky, watching the “real” experts explain how to improve, enhance, and magnetize the “user experience.” The present situation is that hand waving cannot disguise three issues which are often an adult version of a children’s birthday party food fight:

Read more

Hlava on Machine Assisted Indexing

September 8, 2011

On September 7, 2011, I interviewed Margie Hlava, president and co-founder of Access Innovations. Access Innovations has been delivering professional taxonomy, indexing, and consulting services to organizations worldwide for more than 30 years. In our first interview, Ms. Hlava discussed the needs for standards and the costs associated with flawed controlled term lists and some loosely-formed indexing methods.

In this podcast, I spoke with her about her MAI or machine assisted indexing technology. The idea is that automated systems can tag in a consistent manner high volume flows of data. The “big data” challenge often creates significant performance problems for some content processing systems. MAI balances high speed processing with the ability to accommodate the inevitable “language drift” that is a natural part of human content generation.

In this interview, Ms. Hlava discusses:

  • The value of a neutral format so that content and tags can be easily repurposed
  • The importance of metadata enrichment which allows an indexing process to capture the nuances of meaning as well as the tagging required to allow a user to “zoom” to a septic location in a document, pinpoint the entities in a document, and automated summarization of documents
  • The role of an inverted index versus the tagging of records with a controlled vocabulary.

One of the key points is that flawed indexing contributes to user dissatisfaction with some search and retrieval systems. She said, “Search is like standing in line for a cold drink on a hot day. No matter how good the drink, there will be some dissatisfaction with the wait, the length of the line, and the process itself.”

You can listen to the second podcast, recorded on August 31, 2011, by pointing your browser to http://arnoldit.com/podcasts/. You can get additional information about Access Innovations at For more information about Access Innovations at this link.  The company publishes Taxodiary, a highly regarded Web log about indexing and taxonomy related topics.

Stephen E Arnold, September 8, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Catching Up with Visual Bing

September 8, 2011

We are not tracking visual search with the assiduity we use for POTS or plain old text search. We use a system to show us approximate matches and then we browse. Visual search is, for us here at the goose pond, more like a “close enough for horse shoes” experience than a search experience.

We do want to document  the fact that “Bing Delves Deeper into Visual Search,” announces Search Engine Journal.

We’re now used to getting images and videos in our search results. These projects at Bing, however, aim to put pictures at the other end of the process. Writer Rob D. Young explains”

Bing currently has 88 ‘visual searches.’ Those searches range from the top books to dog breeds to yoga poses and well beyond. Each of these searches comes with an advanced left navigation that lets you see only the images and info that interests you. The Yoga Poses visual search, for example, lets you choose the level of difficulty, the therapeutic purpose, the targeted anatomical area, and more.

Optimization for visual search, says Young, is different from that for text-based searches. Bing taps third parties to decide what content is worthwhile. I think we’ll have to keep an eye on those low-visibility players.

Young envisions a time when the public will be able to create and curate these visual searches. I’m looking forward to it, but my colleagues here at Beyond Search are not impressed. Visual images drag along some interesting copyright and fair use issues. If we use an image for our free blog, we try to provide a link to the source of the image and a happy quacking thank you. If someone objects, we delete the image. Will image search improve by leaps and bounds? Nope, more like a few tentative waddles, then a bit of a rest.

Cynthia Murrell, September 8, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Possible Changes Ahead for eDiscovery Rules

September 8, 2011

E-discovery 2.0 asks, “New eDiscovery Rules on the Horizon?” Potential amendments to the Federal Rules of Civil Procedure are to be discussed at a mini-conference scheduled for September 9, 2011 by the Advisory Committee on Civil Rules. Writer Matthew Nelson explains the significance of this meeting:

The mini-conference is important because it is part of a seven step process that could ultimately lead to new rule amendments affecting all litigators and the organizations they represent.  Any new rule proposals developed by the subcommittee at the September mini-conference will be considered by the Advisory Committee this November in Washington D.C.   The proposals, in one form or another, could ultimately become law.  Both Supreme Court and Congressional approval are ultimately required.

One area that cries to be addressed is the controversial question, at what point does the duty to preserve evidence kick in? If the answer is when a complaint is served, that may leave too much leeway for evidence destruction at the first sign of a potential complaint.

Many feel that the current rules are too murky, making companies anxious about what they must do to avoid future sanctions. Further complicating the picture are questions about the impact of cloud computing on civil litigation.

We’re just at the beginning of the long process of amending these rules. If your business is concerned with eDiscovery, though, you’ll want to keep up on the progress.

Cynthia Murrell, September 8, 2011

Sponsored by Pandia.com

When Social and Search Meet in the Enterprise

September 8, 2011

Organizations are embracing Microsoft SharePoint as a platform for collaboration and other social online messaging. “If You Must Have In-House Social Tools, Go with SharePoint” is representative of the flood of information about SharePoint’s utility for collaborative activities.

J. Peter Bruzzese said:

he good news, at least from the SharePoint perspective, is that you have a tremendous amount of control over the amount of information people can share. For example, by deploying the User Profile Service Application in a SharePoint server farm, you can deploy My Sites and My Profile options to your users. They can then enter their own profile information, upload images of themselves for a profile picture, create a personal page with a document library (both personal and shared), tag other people’s sites and information, and search for people within the organization based on their profiles. The SharePoint administrator can control the extent to which the sharing occurs. You can adjust the properties in the profile page, turning options on or off and adding new properties if needed. You can turn off the I Like It and Tags & Notes features, and you can even delete tags or notes your corporate policy disapproves of. You can access profile information and make changes if needed. And you don’t have to turn on My Sites or let people create their own blog and so on: It’s not an all-or-nothing situation with these tools (ditto with third-party tools).

The excellent write up does a good job of explaining SharePoint from a high level.

There are three points which one wants to keep in mind:

First, collaborative content puts additional emphasis on managing the content generated by the users of social components within SharePoint. In most cases, short message are not an issue. What is important, however, is capturing as much information about the information as possible. One cannot rely on users to provide context for some comments. Not surprisingly, additional work is needed to ensure that social messages have sufficient context to make the information in a short message meaningful to a person who may be reviewing a number of documents of greater length. To implement this type of feature, a SharePoint licensee will want to have access to systems, methods, and experts familiar with context enhancement, not just key word indexing.

Second, the social content is often free flowing. The engineering for a “plain vanilla” SharePoint is often sufficiently robust to handle typical office documents. However, if a high volume flow of social content is produced within SharePoint, “plain vanilla” implementations may exhibit some slow downs. Again, throwing hardware at a problem may work in certain situations but often additional modifications to SharePoint may be required to deliver the performance users expect. Searching for a social message with a key fact can be frustrating if the system imposes high latency.

Finally, social content is assumed to be a combination of real time back and forth as well as asynchronous. A person may see a posting or a document and then replay an hour or a day later. Adding metadata and servers will not address the challenge of processing social content in a timely manner. Firms with specific expertise in search and content processing can help. The approach to bottleneck issues in indexing, for example, rely on the experience of the engineer, not an FAQ from Microsoft or blog post from a SharePoint specialist.

If you want to optimize your SharePoint system for social content and make that content findable, take a look at the services available from Search Technologies. We have deep experience with the full range of SharePoint search solutions, including Fast Search.

Iain Fletcher, September 8, 2011

Sponsored by Search Technologies

Study Shows Majority is Less Popular than Believed

September 8, 2011

Now we have a 10 percent rule. Wow.

It is not often that I question the conclusion of credible research, but it happens.  I couldn’t be such a self-professed lover of science if I abandoned the whole premise of critical thinking, but I digress.

Minority Rules: Scientists Discover Tipping Point for the Spread of Ideas” is a fascinating read.  For the sake of debate, I will quote the findings directly as stated in the press release:

“Scientists at Rensselaer Polytechnic Institute have found that when just 10 percent of the population holds an unshakable belief, their belief will always be adopted by the majority of the society.”

Intuitively, this makes perfect sense.  The idea of the workings of group mentality has been drilled into psych 101 students for decades; it’s textbook human behavior and frequently observed in reality.  It is no coincidence that the political rhetoric exhibited on the 24-hr new cycle is unwavering in its repetitiveness.  This latest discovery is merely a quantifying extension of an established idea.

Despite this, I am still wrestling with the concept as presented above.  Call my complaints overly finicky, but phrasing is important.  First, ‘always’ is a dangerous term.  Always.  Further, I find ‘society’ to be too vague to be meaningful.  Another issue I am having is the lack of time-scale definition for the shift to occur, and failure to mention the necessity of consistent communication of the lesser held belief.

An example: after a spot check of polls for confirmation, I believe it is safe to say at least 10% of the world’s population, or any smaller subset, generally is against war engagement in any circumstance.  I speculate this has been a consistent viewpoint throughout time, though the ruling majority has yet to bend to the will of the committed minority.  On the off chance world peace is achieved in the next two to ten centuries… does that count?

Could the prime proof of validity, the recent turn of regimes in the Middle Eastern regions, have occurred at such a pace without the help of online social media?

Barring semantics, I am behind the underlying principle and look forward to hearing advancement of the theory.  Applied to advertising and politics, once again we are shown manipulation of the general public is nearly effortless.  All the more reason to stick with critical thinking.

What’s the impact on search? If algorithms have a 10 percent threshold, the results will not reflect popularity but the biases of a minority of users. Black box algorithms are interesting in this context.

Sarah Rogers. September 8, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

« Previous PageNext Page »

  •  Only search links from this page: