Gray Lady Limping: A Troubled New York Times?

December 16, 2011

I don’t want to draw parallels between the management shifts at Thomson Reuters and the New York Times. Let me document the fact that another semi-surprise hit the struggling New York Times. Navigate to “NY Times CEO exiting, without Explanation.

The Times Co gave no explanation for Robinson’s sudden departure, which caught analysts as well as company insiders by surprise. Speculation among industry observers and the analyst community centered on the company’s faltering stock price, which has declined more than 80 percent since Robinson was appointed CEO in December 2004. This year alone, shares are down nearly 25 percent, a performance that has frustrated investors.

Also interesting was the departure of Martin Nisenholtz, the person who has matched the dismal performance of the Financial Times’s online services. After pulling the New York Times from LexisNexis, the New York Times demonstrated that it was unable to generate big dough when it came to leveraging its brand in the online world. I view the misguided handling of the LexisNexis deal as the first benchmark in the Times’s fascinating financial decline. Business school case study anyone: LexisNexis to the first Times’s online service to the current line up of services to the fumbling of its own indexing to the handling of About.com to today. Yowza. I am glad I am in rural Kentucky, semi retired, hopeless confused, and no longer working in the newspaper industry. Anyone hear the sound of dead trees falling in the forest? When you walk alone and get lost, one can spend quite a while in the wilderness. Watch out. Here comes another dead tree falling.

Stephen E Arnold, December 16, 2011

Sponsored by Pandia.com

Quote to Note: PowerPoint

December 16, 2011

I encounter PowerPoint “decks” when indexing enterprise content. I should emphasize the plural. The bane of PowerPoint is that users skip over the metadata. When indexing an enterprise corpus, there are lots of versions of a particular PowerPoint deck. To make matters more interesting, some decks include confidential information. Running a query on a PowerPoint collection without figuring out versions, duplicates, access rights, and date and time conflicts makes for a long spell of opening, scanning, closing with the cycle repeated many times.

The quote appeared in the write up “PowerPoint Alternative Closes $14 Million Funding.” (Note: this is a Murdoch Wall Street Journal link which can go dark without much warning.)

If you have ever sat through a death-by-PowerPoint presentation (once described by commentator Michael Bywater as “the most loathsome, vicious and immoral piece of software ever produced.”)

I find the sequence loathsome, vicious and immoral fascinating. Software, not its users, are loathsome, vicious, and immoral. Hmmm. Software, not the users. I want a T shirt with the phrase printed across the chest area. Quite a conversation starter I wager.

Stephen E Arnold, December 16, 2011

Sponsored by Pandia.com

Isys: Eliminating Search Speed Bumps

December 15, 2011

I thought speed bumps were sleeping policemen. ISYS Search tackles them. Thump. Squish. Navigate to “Isys Tackles Enterprise Search Speed Bumps.” The idea is that Isys can make a problematic findability problem a non issue. According to the write up:

The new version features ISYS 1-Click File Finder indexing, analytics and search technology, as well there are seven major new application features. ISYS Research Accelerator is a customizable interface that lets business users search and refine the results the way they want – and make the information easily available and actionable for others. ISYS Information Map offers an advanced visual navigation tool that lets business users see and explore the links between pieces of information. The new Timeline Refinement Bar makes large results sets easy to navigate and ensures users know they’re getting to the most accurate and recent versions of documents. ISYS Enterprise 10.0 introduces Multi-core Indexing, which promises to significantly improve indexing speed and robustness, with multiple ‘worker tasks’ able to handle unlimited filename lengths and unlimited document container depths. Users can now view common document formats (like MS Office, Adobe PDF) the way they were intended, with full layout, fonts, images and hit-highlighting. ISYS Enterprise 10.0 can search by document type extension across 400+ document, file and email types. Also, there are native 32-bit and 64-bit Server Versions to allow organizations to make use of their existing hardware.

The write up did not include information about license fees, visualization, extensibility, application programming interfaces, and customer support options. You may want to contact the company for these details. I did not include Isys in my “The New Landscape of Enterprise Search.” The company hit my radar with its connector licensing strategy, which struck me as an interesting idea. This new release reminds me that Isys is in the enterprise search market. That sector is in flux with other vendors repositioning themselves, throwing around buzzwords, and reinventing themselves as big data analytics companies. Isys is using the the lingo of a more traditional, pre mobile app approach to enterprise search. You can get more information at the Isys Information Center. One question: Will I get a ticket for speeding down the enterprise information highway with the goddess of he goddess of motherhood, magic and fertility? Kentucky is a pretty conservative place.

Stephen E Arnold, December 15, 2011

Sponsored by Pandia.com

Taxonomy: More Marketing Craziness in Play?

December 12, 2011

For whatever reason, I have been picking up rumors, factoids, and complaints about the sales and marketing tactics of various search and content processing vendors. With holidays just around the corner, one would think that in run up to Kwanzaa, Christmas, Hanukkah, and Boxing Day folks would chill.

Ah, Agility!

The first dust up concerns tag lines. At issue is the word “agile”, which is becoming one of more popular terms. I was in a meeting at which a heated discussion about whose search and content processing system is agile. Endeca claims agility. I am not going to dispute that a 13 or 14 year old system is not agile, but in Internet years, there may be some flexibility lost. Run a query for “agile” and “search” and you get a hit to a recruitment firm, a marketing outfit, and something called the Tamilan Search Engine. I also spotted PolySpot, a French infrastructure, solutions, and applications company. The problem is that words are slippery. What are the synonyms for “agile”? I expect to see some of these turning up in 2012. How about gazelle search or spry search?

In though economic times, financial pressures can distort business methods.

Circular Partnerships: Snakes Eating Their Tails

The second dust up concerns partnerships. I have been looking through the list of partners identified by such companies as Microsoft, WAND, and others. What I have discovered is that most of the partners are either household names like IBM or companies I have never heard of. Furthermore, when I dig into the partners’ names unfamiliar to me, I discover companies which are consulting firms or resellers who offer a roster of “stuff.” I understand the importance of amplifying a sales force. A partnership plan is little more than a way to reduce the cost of getting a lead and making a sales call. One of the experts in this game is the struggling giant Thomson Reuters. The company signs up partners when sales flag. In the taxonomy game, the partnerships have another twist. The linkages are circular. Antidot or Modeca points to partners and partners point to other search and content processing vendors which point to the original company. I find this confusing because “partner plays” are gaining momentum among specialist firms. I think the “partner” card is an indication that a search and content processing firm may be beating the bushes to get revenue. Just my opinion, of course.

Today, everything is for sale. Be wary if a pitch sounds too good to be true. Image source: http://asksistermarymartha.blogspot.com/2009_10_01_archive.html

Pitching Automation No Matter the Consequences

The third dust up involves taxonomies and is related to the circular nature of partnerships and financial pressures. Now there is considerable contention in the market with regard to taxonomies. The word “taxonomy” itself is a shuttlecock with software badminton players swinging with abandon. The idea is simple: A hierarchical word list. But with hot new spins like ontology (not to be confused with the branch of metaphysics that deals with the nature of being), metatagging, and categorization.

On one side of the dictionary are those who want the software to discover the concepts, terms, and bound phrases. Then these terms are automatically assigned to content processed by the system. If this sounds like the Bayesian magic associated with Hewlett Packard Autonomy or Recommend, you are on the money. There alternative approaches which have considerable payoff. A good example is the work done by Tim Estes and his team at Digital Reasoning, a firm which received financial goodness from SilverLake Sumeru. The idea is that humans play either a modest role or no role at all. Because of the volume of data flowing through a system, human intermediated systems struggle to keep pace with fluidity of human discourse. On one side, therefore, automation. For simplicity’s sake, let’s call this the Google approach.

On the other side of the dictionary are those who see humans with subject matter expertise playing an important role. The idea, which seems quaint to many of the self appointed experts and azure chip consultants, is that human beings can set up a conceptual scheme, populate it with words, terms, and bound phrases. Thus, armed with a controlled term list, a system can use those terms to index or tag content. The idea has merit because the American National Standards Institute has spelled out guidelines for controlled term lists.

Here’s how the battle shapes up. One one side are the “we don’t need any humans” crowd. In my opinion, some enthusiasts for this no-humans position are TEMIS, Google, and in some cases Autonomy. Many of the automated indexing and tagging systems work quite well when the corpus of content is tightly bounded. What do I mean by “tightly bounded?” Pick up a hard copy of a medical journal about cancer or about nuclear engineering. The vocabulary does not vary too much from article to article within each topic area. In fact, once you learn about 2,000 nuclear terms, you can figure out the basic idea of most nuclear power write ups.

Are some search and content processing vendors taking notice of sales methods associated with used car sales professionals? Even Google is advertising on the “vast wasteland”. Image source: http://www.townhillautosales.com/?24

What happens when you process unbounded content? Well, real life language use is more tricky. Non experts simplify complex ideas, often importing non specialist terms for arcane jargon. Do you know what an ECCS is? Probably not. A “real” journalist or consultant will convert the notion of an emergency core cooling system to something along the lines of a “spare radiator.” Not exactly on the money, but indicative of how precise language is softened. In these situations, it is useful to have a term list of the specialist words, terms, and bound phrases. Subcategories under Cooling Systems can contain the ECCS entry and others. The idea is that content can be assigned certain terms no matter what the words and phrases in the source document may be.

Some companies like TEMIS, Google, and Yandex are not to keen on the human involvement. The reasons range from the cost of getting humans to do index and taxonomy development to an arrogance about how software performs. Wizards see the world in terms of their wizardry which is okay with me. I think it is silly to assume software can handle language with the facility of humans, but I am have some experience with what happens when “good enough” is not.

Other companies like Access Innovations (a former client from days of yore)  and (believe it or not) Dow Jones (a component of the exciting Murdoch organization) believe that humans are important. The humans can develop the lists, set up guidelines or rules for the indexing system to consult, and provide interfaces to allow subject matter experts to adjust the term list and tune the indexing system. The benefit is that the accuracy of the indexing, based on my real life experience, is much better. There is language drift, but there are methods to intervene and correct that drift.

Without a method to adjust to what software is too stupid to see, the indexing “drifts”. The impact of this is not too good. You run a query for a particular snake bite treatment, and you cannot locate the content. The term you use is not assigned by the system and it does not appear in the source document. So what? Well, how about your child dies. Maybe this is an unpleasant thought, but the consequences of lousy indexing and concept assignment are often more serious than not finding a pizza joint in San Jose.

Here’s what one indexing professional told me. I have to mask the name and company to avoid a hassle, but you will get the idea from this comment I captured:

Some companies such as a certain Paris-based company sell expensive software to clients and then leave. People don’t know what to do with it.  So they have an expensive difficult to implement natural language processing systems which could work but are left hanging.  The package from us is the whole thing we are big on total service, follow up training, and getting people implemented and using it without our help but we are there – just a phone call or email away to help and support them. The Paris based company says companies like Access Innovations are not a natural language processing system and although we do have the natural language processing  we don’t make people pay for it separately. With most systems, rules are often needed to achieve more than “good enough” tagging.  Access Innovations, a specialist able to generate ANSI compliant term lists, delivers 85 to 90 percent accuracy. The Paris-based outfit delivers far lower accuracy. Clients don’t understand the issues with low accuracy tagging, findability, and long term system usability.

So What?

What we have, gentle reader, is an example of the automation crowd glossing over the need for human-intermediation solutions. What disturbs me is that the chatter about taxonomy in boot camps, companies which are coming from left field, and self appointed experts is putting the spotlight on indexing and classifying content.

That’s a plus.

The downside is that when the indexing goes off the rails, the user may not be able to find the needed information. That’s why companies like Digital Reasoning and Access Innovations have the ability to deliver automation plus human-intermediated interactions. The licensee suffers when automation goes wrong. The users suffer. The search system vendor may be blamed. Beware the taxonomy vendor spouting glittering generalities about smart software. Usually the “spout” dispenses tainted outputs.

Bottom line: I avoid vendors who present to me the “one true way.” This approach may work when preparing foie gras. For some taxonomy vendors hungry for cash, the traditional, labor intensive methods get in the way of making a quick sale. Unfortunately when humans create language, more traditional methods are often completely appropriate for mission critical indexing tasks. Honk!

Stephen E Arnold, December 12, 2011

Sponsored by Pandia.com

Big Data a Bane of Small Businesses

December 8, 2011

Is anyone really surprised? “Big Data Strains Small-Business Bandwidth,” announces InfoWorld. Apparently this is news to some folks. Since Thanksgiving, a time to celebrate unemployed English majors and failed azure chip search consultants, I have been involved is four separate meetings about big data. To be fair, each of these meetings talked about the perception of big data, not actually whipping around a couple of copies of the Internet or a year’s worth of Twitter and Facebook gold ore.

InfoWorld is pretty excited about big data. We learned from the write up that some folks thought storage would be the biggest hurdle small businesses would face when wrangling large amounts of data. Not so, reports writer Matt Prigge. The article asserts:

Storage vendors seem to be doing a great job staying on top of the demand for ever larger data densities and software to allow you to make more efficient use of it (think dedupe and intelligent thin provisioning). But for the most part, you can’t say the same about the telcos and ISPs providing the wide area networks we’re using to acquire and share that data.

The problem is worst in rural areas, where expensive solutions like DS3, SONET, ATM, and Metro-Ethernet are simply not available. Many businesses turn to the cloud, but that won’t work for companies with certain conditions, like highly graphical work. Besides, you have to be very confident in your Internet service provider to rely on hosting services. The solution? Some companies just have to pack up and move (back) to the big city.

Yes, everything works well when there is unlimited bandwidth, unlimited technical resources, and Talking about big data is different from processing in an operational unlimited infrastructure. The real world is different from the Ivory Tower, however. Three observations:mode real time flows of content from social systems, mobile phone usage reports, etc. But talk is cheap and easy. Big data is neither.

  1. Big data usually skips over the issue of latency. There are different definitions of real time in indexing big data. Defining terms is a useful first step.
  2. Most of the big data chatter is marketing. You, gentle reader, should know what marketing means: sizzle, not sirloin.

Cynthia Murrell, December 8, 2011

Sponsored by Pandia.com

Enterprise Search: You Know You Are in Trouble When

December 7, 2011

When I was jammed between two less than svelte individuals on a flight from Dallas to Louisville, I read a chubby article called “IT Inferno: The Nine Circles of IT Hell.” My view of the write up was that the author walked to the end of the pier and fell into the water off the shore at one of the beaches south of Rio’s harbor. Yikes.

I then noted “Six Lessons from a Lightning ERP Rollout.” I think the main idea is that if one goes fast, the outcome of an enterprise resource planning project is going to better. Speed does only good, which will bring joy to the wanna be F1 and Nascar drivers in information technology. I found this statement interesting:

ERP implementations have gained a bad reputation, in which merely late is considered very good, and spiraling out of control is considered common. There are always more ways of doing something wrong than doing something right. Beyond that, the act of defining an effort as an ERP implementation contributes to the likelihood of disappointing results.

The idea in these two write ups was a good one. Make some mistakes and end up caught in a Dante-esque world. In theory one could emerge in the land a paradise, but few make it. As far as I know, Dante, whose house in Florence looked over a street in which all manner of technical and non technical activities unfolded, did not think much about information technology, online systems, and Service Level Agreements with $100 billion consulting outfits like IBM. No Watson was in the neighborhood to assist Dante with a thorny question.

I want to ignore the reference to Dante and focus on a handful of ideas in the six page write up. Furthermore, I want to sidestep the generalizations about large scale information system projects in organizations and narrow my focus to search. I won’t even draft business intelligence or the other 13 euphemisms for search I wrote about a few weeks ago. (I picked up this theme and drilled into some specific companies in my December submission to the January or February Information Today for which I write columns for money. Amazing, I know.

So, no big picture, no Dante, and certainly no theological overtones. I remarked at a speech in November 2011 that I once studied and indexed medieval religious literature. Got a laugh. Too bad it is true. Now you know why I poke mercilessly at English majors, failed high school teachers, and self appointed experts. The goose is talking about his guru-ness or, I should say, his being a goose-ru.

So, you know you are in trouble when:

  1. You cannot control your budget for search. You will know when this occurs because the organizational equivalent of a six grade Catholic school teach armed with a wooden pointer and a habit (the good kind) tells you to stay after class. The CFO watches the money, and she will bring you up to speed on the  magnitude of the problem you have created. Numbers make clear that a search project has gone south. I can hear the snap of the wooden pointer now. Ouch.
  2. Users work around the system. Now most information technology professionals deny that users are unhappy with any aspect of a system which is actually online and functioning to some degree. However, when we ask users about search systems, the message we receive from surveys and interviews is easy to understand: Search is not too useful. The proof, however, is not a survey. Just ask users what they do to find information and you will learn about intra-company networks, bootleg systems,  and even hosted services which poke through the firewall to index content. The sticky notes are a dead giveaway as well.
  3. The infrastructure is not able to keep pace with indexing. We hear a lot of baloney about how fast a search system is, how small an index’s overhead will be, and how little latency exists within a search system. The reality is that most search systems cannot provide near real time index updates. As a result, there are numerous instances of old information used for proposals, financial summaries, and marketing reports. How does a cash strapped search system manager keep pace with burgeoning digital content? Easy. Index less and update the index on a relaxed schedule. Don’t believe me? Where is the marketing PowerPoint for the presentation given yesterday by the marketing VP? I get these materials by asking the person to give it to me on a storage device or dump it in an online file sharing service. That sometimes works. Search usually does not work.
  4. The vendor changes his marketing tune. Here’s how this goes. You license a search system from a company which asserts that it is 100 percent committed to enterprise search. then you read in Beyond Search that the company is in customer support, information optimization (whatever that means), business intelligence (an oxymoron at Bank of America or Goldman Sachs perhaps?), or taxonomy generation. What happened to search? In the quest for revenues. search vendors change more rapidly than we can track. When this happens, how easy is it to  get the exact technical inputs your need without delay? If  less than 10 minutes, you have a winner vendor. More than 10 minutes, well, well, well.
  5. Your open source wizards takes a job at Cisco Systems, eHarmony, IBM, or one of the other Open Sourcey firms. You remember the old space launches. I bet this triggers a memory moment, “Houston, we have a problem.” One certainly does.

Check out the IT inferno. Just make sure search is not throwing coke into the oven as someone increases the oxygen flow. You can have a coal fired meltdown without nuclear powered search. AtomicPR, however, will lay down a radioactive blast zone for its various solutions, including XML as a universal search solution. I have heard that phrase before, “universal search.” Those categorical affirmatives are a sixth indicator I believe.

Stephen E Arnold, December 7, 2011

Sponsored by Pandia.com

Web Search Engines Ordered to De-Index Hundreds of Sites

December 5, 2011

Interesting action: Law & Disorder reports, “US judge orders hundreds of sites ‘de-indexed’ from Google, Facebook.” And Twitter, and Google+, and Bing, and Yahoo. . . . Why, you ask? For allegedly selling counterfeit Chanel goods.

Chanel, you see, took to the courts in Nevada to go after sites is says were selling knockoffs. The fashion powerhouse’s own people reviewed deliveries and Web sites and testified that the goods were fake. That convinced Judge Kent Dawson, who ordered those site names seized and redirected to a page with a notice of the seizure. He also ordered a total ban on search engine indexing of the sites.

The article asserts:

Missing from the ruling is any discussion of the Internet’s global nature; the judge shows no awareness that the domains in question might not even be registered in this country, for instance, and his ban on search engine and social media indexing apparently extends to the entire world. (And, when applied to US-based companies like Twitter, apparently compels them to censor the links globally rather than only when accessed by people in the US.)

Can he do that? Apparently.

Writer Nate Anderson points out that actions like this may render the whole debate over the Stop Online Piracy Act moot. Similar cases are proceeding apace in other courts, and companies may win control over the Internet that way. Our view is that once content is not listed in a public search system, the content and the Web site to which links point cease to exist. Certain governments remove content from their servers too. What’s this mean to a researcher? If you don’t know what you don’t know, you are informed. How’s that for a way to sharpen the intellectual spikes, gentle reader? Could some banned sites create a different, non-public network? If a search fails in an index, does the content exist? I suppose it depends on which index one searches.

Cynthia Murrell, December 05, 2011

Sponsored by Pandia.com

Thanks Be For A Guide to SharePoint Server 2010 Search

November 24, 2011

To understand SharePoint’s FAST Search Server, it’s smart to work your way up by first understanding SharePoint Server 2010 Search. “Configuring Enterprise Search in SharePoint 2010” is a useful guide that covers search features and has lots of screen shots. A handy flow chart visualizes the following:

“SharePoint 2010 search architecture is made up of the Crawler, Indexing Engine, Query Engine and the User Interface and Query Object Model.  We now have greater flexibility and expandability with our search design in 2010 and can setup not only multiple Query Servers but can now scale out our Index server and add multiple instances.”

Savvy businesses know the benefits of collaborative content management with integrated search – add access to the constantly growing information in the Cloud, and company knowledge gets a big boost. For those needing a deeper solution that has the ability to answer enterprise search needs in the cloud, you may want to explore Mindbreeze.

Their information pairing technology results in a complete overview of a company’s knowledge, merging enterprise information with Cloud information.

Sara Wood, November 24, 2011

Sponsored by Pandia.com

Brainware and the Back Office

November 23, 2011

Have we been ignoring the back office as a niche for search and content processing? No, we have not ignored this niche.

There is money to be made in handling paper plus digital content, and Brainware wants to convince some organizations that it leads the field. News.Gnome.es clued us in with “Brainware Emerges As Market Leader For Intelligent Data Capture: 2011 Survey.” The survey, conducted by the Institute of Financial Operations, focused on the use of automated data capture to tame companies’ accounts payable. Thomas M. Bohn of the Institute summarizes the results:

These findings demonstrate that accounts payable departments using data capture technology—especially higher volume, complex operations—hold the advantage in reducing costs, improving turnaround times and optimizing accountability over their process. Furthermore, these insights are consistent with the numerous customer case studies I’ve witnessed while hosting events with Brainware this past year.

This press release from Brainware emphasizes that company’s leadership in this area. The enterprise serves many large companies and organizations globally, and boasts that its products “manage unstructured data without templates, exact definitions, taxonomies or indexing.”

It seems the purveyors of search solutions can’t help but invent new classifications as the try to cope with the complexity of their task. We will update this list of 14 silver bullets to include this new category.

Cynthia Murrell, November 23, 2011

Sponsored by Pandia.com

Good Content Wins Fans

November 19, 2011

Suddenly Webmasters are chattering about content. After years of tricks, indexing silliness, and down right misleading search engine optimization games—content is popular again.

With the increased popularity of e-books, and the easily accessible tools of creation, distribution and promotion of web content, there has been speculation regarding how this will affect the quality of content being released.

In the TechDirt article “Good Content Doesn’t Get Buried By Bad Content” we learned:

We have no doubt that much new content being produced is, in fact, pretty bad. I’ve never quite understood the argument, though, that bad content harms good content. You just have to ignore the bad content and follow the good content. What that means is that the world just needs good filters, and we keep seeing more and more of those showing up every day.

The write up asserts that, with sites like Amazon, fans are able to show their support for the good books that they love by writing reviews. This helps separate the good content  from the bad.

There will always be skeptics out there challenging technological innovations. I would argue that while it may be easier to make your content available for public consumption than in years past, bad content won’t win over the fan base needed to make an impact.

Jasmine Ashton, November 19, 2011

Sponsored by Pandia.com

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta