SharePoint Taxonomy Management Myths

November 22, 2011

Taxodiary reported this week on taxonomy management in the first article of the series “Five Myths About Taxonomy and SharePoint.”

Each myth will be reported on separately but the first one that the article tackles is a big one. Myth: SharePoint now has taxonomy management capabilities. The article states:

“SharePoint has certainly made a major step up by embedding the taxonomy capability within SharePoint however it is missing most of the critical features which make taxonomies so useful. No related terms, management within the term store is so painful even Microsoft employees use an outside tool. The set of taxonomy attributes allowed is very meager, tracking of term changes is nearly no existent, synonyms are not allowed, display space is limited to ten lines of the taxonomy at a time, etc.”

In order to avoid SharePoint’s taxonomy limitations, the article recommends that end users utilize a third party tool to help fill in the gaps as well as get feedback from user groups like the SLA Taxonomy Division.

We think that Taxodiary hit the nail on the head with this post. The bottom line is that too much jabber about taxonomies and controlled vocabularies are uninformed. You should attend to the experts, not the self appointed poobahs.

Jasmine Ashton, November 22, 2011

Protected: Overcome Taxonomy Writer’s Block in SharePoint 2010

November 17, 2011

Written by Stephen E. Arnold · Filed Under Enterprise search, Microsoft, News, Search, SharePoint, Taxonomy, Technology | Comments Off on Protected: Overcome Taxonomy Writer’s Block in SharePoint 2010

Mindbreeze Satisfies Users Need for Findability

November 15, 2011

Stephen Fishman of CMS Wire discusses the problems that arise from Microsoft SharePoint’s desire for broad appeal in, “SharePoint is Crack and Microsoft is the Pusher.” Although a humorous title, Fishman makes some valid points about Microsoft’s attempt at reeling in the masses only to leave them yearning for more. Much like the touted panacea of Microsoft Access or Lotus Notes, SharePoint does not deliver on its promises.

Fishman drives home his main point after rolling out a list of smaller issues:

“But the worst thing about SharePoint by far is that it recreates the problem it was intended to solve, only on a much larger scale. What starts out as a hierarchically organized file share ends up as a hierarchically organized file share with a web interface on top of it.”

The Fabasoft Mindbreeze solution is clear in their latest update: “With the new release, Fabasoft Mindbreeze displays search results clearer and more structured. Index tabs break down search results in specific groups and topics. That way, users see immediately what documents contain the search term and in what context it is mentioned. With this structured overview, users find what they are looking for much faster.”

Fishman also finds fault with SharePoint’s disregard for sound implementation and taxonomies: “SharePoint is constantly rolled out in a slipshod manner with little thought to governance or developing scalable and maintainable taxonomies . . . The resulting organic growth inevitably results in buried content with no easy mechanisms for ambient findability.”

Mindbreeze accounts for synonyms and taxonomies in its search, features that are in place out-of-the-box, but also customizable. To solve SharePoint’s lingering issues of findability and a poor user experience, explore an efficient solution like Fabasoft Mindbreeze. Built with the user in mind.

*Disclaimer – Mindbreeze is currently upgrading their website. Links will be checked and if problems arise they will be updated. Thanks for your patience.

Emily Rae Aldridge, November 15, 2011

Written by Stephen E. Arnold · Filed Under News, Search, SharePoint, Taxonomy | Comments Off on Mindbreeze Satisfies Users Need for Findability

Business Process Management: Bit Player or Buzz Word?

November 7, 2011

I spoke with one of the goslings who produces content for our different information services. We were reviewing a draft of a write up, and I reacted negatively to the source document and to the wild and crazy notions that find their way into the discussions about “problems” and “challenges” in information technology.

In enterprise search and content management, flag waving is more important than solving customers’ problems. Economic pressure seems to exponentiate the marketing clutter. Are companies with resources “too big to flail””? Nope.

Here’s the draft, and I have put in bold face the parts that caught my attention and push back:

As the amount of data within a business or industry grows the question of what to do with it arises. The article, “Business Process Management and Mastering Data in the Enterprise“, on Capgemini’s Web site explains how Business Process Management (BPM) is not the ideal means for managing data.

According the article as more and more operations are used to store data the process of synchronizing the data becomes increasingly difficult.

As for using BPM to do the job, the article explains,

While BPM tools have the infrastructure to do hold a data model and integrate to multiple core systems, the process of mastering the data can become complex and, as the program expands across ever more systems, the challenges can become unmanageable. In my view, BPMS solutions with a few exceptions are not the right place to be managing core data[i]. At the enterprise level MDM solutions are for more elegant solutions designed specifically for this purpose.

The answer to this ever-growing problem was happened upon by combining knowledge from both a data perspective and a process perspective. The article suggests that a Target Operating Model (TOM) would act as a rudder for the projects aimed at synchronizing data. After that was in place a common information model be created with enterprise definitions of the data entities which then would be populated by general attributes fed by a single process project.

While this is just one man’s answer to the problem of data, it is a start. Regardless of how businesses approach the problem it remains constant–process management alone is not efficient enough to meet the demands of data management.

Here’s my concern. First, I think there are a number of concepts, shibboleths, and smoke screens flying, floating, and flapping. The conceptual clutter is crazy. The “real” journalists dutifully cover these “signals”. My hunch is that most of the folks who like videos gobble these pronouncements like Centrum multivitamins. The idea is that one doze with lots of “stuff” will prevent information technology problems from wrecking havoc on an organization.

Three observations:

First, I think that in the noise, quite interesting and very useful approaches to enterprise information management can get lost. Two good examples. Polyspot in France and Digital Reasoning in the U.S. Both companies have approaches which solve some tough problems. Polyspot offers and infrastructure, search, and apps approach. Digital Reasoning delivers next-generation numerical recipes, what the company calls entity based analytics. Baloney like Target Operating Models do not embrace these quite useful technologies.

Second, the sensitivity of indexes and blogs to public relations spam is increasing. The perception that indexing systems are “objective” is fascinating, just incorrect. What happens then is that a well heeled firm can output a sequence of spam news releases and then sit back and watch the “real” journalists pick up the arguments and ideas. I wrote about one example of this in “A Coming Dust Up between Oracle and MarkLogic?”

Third, I am considering a longer essai about the problem of confusing Barbara, Desdemona’s mother’s maid, with Othello. Examples include confusing technical methods or standards with magic potions; for instance, taxonomies as a “fix” for lousy findability and search, semantics as a work around for poorly written information, metatagging as a solution to context free messages, etc. What’s happening is that a supporting character, probably added by the compilers of Shakespeare’s First Folio edition is made into the protagonist. Since many recent college graduates don’t know much about Othello, talking about Barbara as the possible name of the man who played the role in the 17th century is a waste of time. The response I get when I mention “Barbara” when discussing the play is, “Who?” This problem is surfacing in discussions of technology. XML, for example, is not a rabbit from a hat. XML is a way to describe the rabbit-hat-magician content and slice and dice the rabbit-hat-magician without too many sliding panels and dim lights.

What is the relation of this management and method malarkey? Sales, gentle reader, sales. Hyperbole, spam, and jargon are Teflon to get a deal.

Stephen E Arnold, November 7, 2011

Google and the Perils of Posting

October 21, 2011

I don’t want to make a big deal out of an simple human mistake from a button click. I just had eye surgery, and it is a miracle that I can [a] find my keyboard and [b] make any function on my computers work.

However, I did notice this item this morning and wanted to snag it before it magically disappeared due to mysterious computer gremlins. The item in question is “Last Week I Accidentally Posted”, via Google Plus at this url. I apologize for the notation style, but Google Plus posts come with the weird use of the “+” sign which is a killer when running queries on some search systems. Also, there is no title, which means this is more of a James Joyce type of writing than a standard news article or even a blog post from the addled goose in Harrod’s Creek.

To get some context you can read my original commentary in “Google Amazon Dust Bunnies.” My focus in that write up is squarely on the battle between Google and Amazon, which I think is more serious confrontation that the unemployed English teachers, aging hippies turned consultant, and the failed yet smarmy Web masters who have reinvented themselves as “search experts” think.

Believe me, Google versus Amazon is going to be interesting. If my research is on the money, the problems between Google and Amazon will escalate to and may surpass the tension that exists between Google and Oracle, Google and Apple, and Google and Viacom. (Well, Viacom may be different because that is a personal and business spat, not just big companies trying to grab the entire supply of apple pies in the cafeteria.)

In the Dust Bunnies write up, I focused on the management context of the information in the original post and the subsequent news stories. In this write up, I want to comment on four aspects of this second post about why Google and Amazon are both so good, so important, and so often misunderstood. If you want me to talk about the writer of these Google Plus essays, stop reading. The individual’s name which appears on the source documents is irrelevant.

1. Altering or Idealizing What Really Happened

I had a college professor, Dr. Philip Crane who told us in history class in 1963, “When Stalin wanted to change history, he ordered history textbooks to be rewritten.” I don’t know if the anecdote is true or not. Dr. Crane went on to become a US congressman, and you know how reliable those folks’ public statements are. What we have in the original document and this apologia is a rewriting of history. I find this interesting because the author could use other methods to make the content disappear. My question, “Why not?” And, “Why revisit what was a pretty sophomoric tirade involving a couple of big companies?”

2, Suppressing Content with New Content

One of the quirks of modern indexing systems such as Baidu, Jike, and Yandex is that once content is in the index, it can persist. As more content on a particular topic accretes “around” an anchor document, the document becomes more findable. What I find interesting is that despite the removal of the original post the secondary post continues to “hook” to discussions of that original post. In fact, the snippet I quoted in “Dust Bunnies” comes from a secondary source. I have noted and adapted to “good stuff” disappearing as a primary document. The only evidence of a document’s existence are secondary references. As these expand, then the original item becomes more visible and more difficult to suppress. In short, the author of the apologia is ensuring the findability of the gaffe. Fascinating to me.

3. Amazon: A Problem for Google

Written by Stephen E. Arnold · Filed Under Business process, Business strategy, Feature, Google, Security, Social, Taxonomy | Comments Off on Google and the Perils of Posting

Search May Be Flawed, but Maybe Content Management Is Worse?

September 14, 2011

I pay some attention to search and content processing. I take a dim view of content management because it is a weird business. Unlike records management, CMS as the “real” experts label the systems, CMS has few rules. In the tidy world of records management, there are document retention policies. Even when one ventures into the murk of legal content, there is a body of information about how documents must be tattooed before the legal eagles get to add value to the results of a discovery process. And CMS? Most of it is a Wild West rodeo on broken down horses and amateurs trying to distract the wild bulls and kicking stallions of content creators. I find CMS amusing to watch, but I steer clear. The “governance” experts are swarming over the mess CMS vendors and system users create. Need an example? How about the chaos of many Lotus Notes or Microsoft SharePoint systems. Software just does not impose an editorial policy on employees with degrees in business administration and home economics engaged in writing marketing collateral, proposals, and reports.

I read “Inefficiencies in Collateral Management Cost the Financial Sector More than EUR4 Billion Annually, According to Accenture/Clearstream Survey” and reached two working hypotheses:

First, some information functions may be unmanageable unless rules are set up before the users start cranking out digital outputs. In my experience, this pristine state is tough to achieve. Consequently, most CMS implementations are not going to deliver what users and chief financial officers want. In short, the cost excess is likely to persist. How much money is blown with lousy CMS? Here’s what the article asserts:

The financial services sector could save more than EUR4 billion annually in collateral management costs by addressing operational inefficiencies, according to a survey by Accenture (NYSE: ACN) and Clearstream.

Assume the estimated cost is inflated. Then do a back of envelope calculation that says Manhattan and Tokyo are probably as inefficient. Bingo. We have a nice fat six billion euro number. Not quite like the PIIGS’ debt, but a respectable number which CMS vendors should have kept more manageable.

Second, what about XML. I thought that Extensible Markup Language and systems which whip this file type around would have reduced content costs. I am not sure that has happened. I have formulated a notion that XML has become its own worst enemy. There is not single XML and the costs of pushing verbose objects to and fro, the expense of the humans who must work with systems designed for coding rock stars, and the bewildering diversity of XML tagged content look like deal breakers.

I don’t want to name any XML search and content vendors. I don’t want to highlight vendors who talk about XML and then slap big price tags on content transformation services. I do want to mention that Exalead is XML friendly and does a good job in my opinion with XML in many different forms. To Exalead’s credit, it handles XML quietly. Tooting the XML tuba has not sunk a musical hook in the financial institutions mentioned in the survey.

I am generally suspicious when I read a giant consulting firm and a smaller consulting firm have mined their contacts for data. However, in this case, I think that CMS and XML help me understand the persistent problem which exists in certain market sectors. How do you fix the problem? My hunch is that one should hire Accenture. Given the mess that exists, Accenture may buy better lunches for clients than the “real” poobahs, the art history majors, and the self appointed experts have delivered. Wait. I want to restate this. The problem may be unmanageable. Content is increasing too rapidly and few outfits have the appetite to spend good money after bad. Just my opinion formed in a land fond of beets.

Stephen E Arnold, September 15, 2011

IBM May Need a More Robust Classification Solution

August 18, 2011

According to talk around the water cooler, some IBM content and search units are poking around for a classification “solution”. We think the rumor is mostly big company confusion since IBM already has software available to assess and address an organization’s content classification needs through the use of several components. According to the IBM website:

Most unstructured content is either trapped in silos across the organization or entirely unmanaged “content in the wild.” A majority of that unstructured content can be deemed unnecessary – over-retained, irrelevant, or duplicate – and should be either decommissioned or deleted.

As we understand it, one licenses the Classification Module and/or Content Analytics software to prevent the previously stated problem and to provide content classification.

Sounds great like the ads for IBM mainframes and the promotional information about

But a disturbing question to the ArnoldIT goslings who wear blue IBM logos: What if this stuff costs too much and does not deliver on the fly classification for real time processing of tweets and Google Plus public content?

Maybe an IBM box of parts with an expensive IBM engineering team is not exactly what some outfits require? Perhaps IBM should look around and maybe snap up one of the hot players in the space. IBM has been announcing partnerships with a number of interesting companies. We track Digital Reasoning and and think its technology looks very promising? IBM is in a good position to have an impact in the data analysis space, but it needs tools that go beyond its in house code and Cognos and SPSS methods in our opinion.

Jasmine Ashton, August 19, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Written by Stephen E. Arnold · Filed Under Analytics, Business strategy, Editorial opinion, News, Real time search, Taxonomy, Technology, Text processing | Comments Off on IBM May Need a More Robust Classification Solution

Thoughts from an Industry Leader: Margie Hlava, Access Innovations

August 4, 2011

Here are some astute observations on the direction of enterprise search from someone who knows what she’s talking about. Library Technology Guides points to an interview with Margie Hlava, president of Access Innovations, in “Access Innovations founder and industry pioneer talks about trends in taxonomy and search.”

Ms Hlava’s 33 years in the search industry informed her observations on current trends, three of which she sees as significant: Cloud and Software as a Service (SaaS) computing, term mining, and the demand for metadata.

The move to the Cloud and SaaS computing demands more of our hardware, not less, Hlava insists. In particular, broadband networks are struggling to keep up. One advantage of the shift is a declining need to navigate labyrinths of hardware, software, and even internal politics on the client side. Other pluses are the motion toward increased data sharing and service enhancement. Also, more ways to maintain security and intellectual property rights are on the horizon.

She says that term mining is “a process involving conceptual extraction using thesaurus terms and their synonyms with a rule-base, then looking for occurrences to create more detailed data maps,” according to Hlava. Her company leverages this concept to make the most of clients’ large data sets. She is interested in new angles like mashups, data fusion, visualization, linked data, and personalization, but with a caveat: success in all these depends on the quality of the data itself. “Rotten data gives rotten results.”

Ms. Hlava regards taxonomies and other metadata enrichment as the way to bring efficiency to our searches. In that realm, the benefits have only begun:

“In terms of taxonomies and search, ‘I think we have just scratched the surface. With good data, our clients are in a good position to do an incredible array of new and interesting things. Good taxonomies take everything to the next level, forming the basis of not only mashups, but also author networks, project collaborations, deeper and better information retrieval,’ she concluded.”

Wise words from a wise woman. We look forward to observing these predictions take shape as the search industry moves forward. The interview with Margie Hlava, can be read in full here.

Access Innovations offers a wide range of content management services. The company has been building its semantic-based solutions for over thirty years and prides itself on its unique tool set and experienced personnel.

Stephen E Arnold, August 4, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Written by Stephen E. Arnold · Filed Under Business strategy, Database, Indexing, Interview, Management, News, Taxonomy | 1 Comment

Jewish News Archive: Another Hot Curated Vertical Content Source

May 9, 2011

Anne Mintz, the star of the Forbes’ organization’s information center, shifted direction a while back. She dropped into stealth mode, alerting me to her activities via brief emails. I am delighted to be able to announce her Jewish News Archive project.

The remarkable collection of JTA news reports from 1923 to the present is now available for free at archive.jta.org . Formerly the Jewish Telegraphic Agency, now JTA: The Global News Service of the Jewish People, the organization is a not-for-profit media company similar to the Associated Press. Ms. Mintz, one of the world’s leading experts in business information, told me:

Writing the first draft of Jewish history. The archive of original reporting from around the world documents the Jewish experience of the 20th century, much of it not written about in the mainstream media.

I was delighted with the depth of this new service. She said:

There are more than 7,000 contemporaneous articles reported from Europe between 1937-1945 that document the Holocaust on a daily basis, at least that many documenting the experience of Russian Jews throughout entire reign of Communism, coverage of life in then-Palestine before the new state was inaugurated in 1948, and much more.

You can explore this exceptional resource at http://goo.gl/kPk6d.

If you are one of the video addicts who read Beyond Search, you can get additional information from a nifty YouTube video.

Ms. Mintz–who vies with Marydee Ojala, Barbara Quint, and Ulla de Stricker for the title of best business information expert in the world—told me after I asked about her involvement:

Yes, I worked on the project for four months helping prepare the site for launch on May 9, 2011. The content speaks for itself. One interesting aspect of my role was to help surface the articles on news events that didn’t mention the overall subject, such as the Holocaust and the Six Day War, which of course weren’t referred to as such in the original coverage. Another is making sure that people who search for Sabbath also get stories about Shabbat and Shabbas.

The shift from running a commercial organization’s information operation to developing curated vertical information services is one that is interesting to me. Most of the curated sites are little more than plays for revenue from online advertising services. Ms. Mintz’s work delivers quality without the search engine optimization baloney. This is a victory for curated content. Ms. Mintz receives a virtual laurel wreath from the team in Harrod’s Creek.

Three quacks for this service. What’s next?

Stephen E Arnold, May 9, 2011

Freebie

Written by Stephen E. Arnold · Filed Under News, Reference tool, Search, Taxonomy, Technology, Vertical search | Comments Off on Jewish News Archive: Another Hot Curated Vertical Content Source

Protected: Avoiding the Pitfalls of SharePoint Social Media

May 5, 2011

Written by Stephen E. Arnold · Filed Under Business strategy, Microsoft, News, Security, SharePoint, Taxonomy | Enter your password to view comments.

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.