Want to Be a Search Expert?

November 18, 2011

I saw a story at CBS News’ Web site. The write up “25 College Majors with the Highest Unemployment Rates” was as troubling as it was amusing. I learned yesterday that at some of the top engineering, science, and mathematics recruiting events, US companies were in the minority. In the good old days before Booz, Allen & Hamilton became an azure chip consultant and McKinsey executives donned orange jump suits—college recruitment was a hunting ground for big name US outfits. The idea was to snag the people who were the “right package” for the plum jobs at top line consulting firms, investment banks, and Fortune 50 companies. I did a couple of recruiting swings in the mid 1970s for Halliburton NUS and later for the pre-Daedalus Booz, Allen & Hamilton. I find the brain drain which sucks talent from the US to hot spots like Brazil, China, and South Korea fascinating.

The CBS story reminded me that self appointed experts will probably come to search, content processing, “big data”, and other fields of mass confusion from these disciplines. What will tomorrow’s “experts” bring to the table in terms of subject matter expertise? Here’s the top 10college majors with the alleged highest unemployment rate:

  1. Clinical psychology 19.5%
  2. Miscellaneous fine arts 16.2%
  3. United States history 15.1%
  4. Library science 15.0% (tie)
  5. Military technologies; educational psychology 10.9%
  6. Architecture 10.6%
  7. Industrial & organizational psychology 10.4%
  8. Miscellaneous psychology 10.3%
  9. Linguistics & comparative literature 10.2%
  10. (tie) Visual & performing arts; engineering & industrial management 9.2%

You will want to digest the entire list at the link provided.

A couple of comments. I got a hearty laugh when I mentioned that my focus in college was medieval religious sermons in Latin. No one laughed when I mentioned that I wasn’t reading the documents. I was indexing them using punched cards. But notice that “miscellaneous fine arts” does leave about 83 percent of those with that training unemployed. The top stop, which surprised me, was clinical psychology. I will not forget my early consulting project for T George Harris, then the publisher of Psychology Today. I recall his describing those with degrees in psychology as “crazy” and then divided psychologists into two broad categories. One category involved psychologists who watched interactions among male and female rats and others who did math.

Notice that unemployment rates for visual and performing arts graduates and engineering and industrial management graduates is “only” 9.2 percent. Presumably some of the most talented engineering people with jobs will be working outside the United States.

What about the azure chip consulting firms and the self appointed experts? My thought is that the work product of these outfits will reflect the talent applying themselves to these disciplines. Ever wonder why so many firms are in financial trouble? Ever ask, “Which management consultancy was helping these folks?”

Ever ask, “Why did that enterprise search project fail?” Ever ask, “Which search or content processing consultant advised that outfit?”

Good questions to ask.

Stephen E Arnold, November 18, 2011

Sponsored by Pandia.com

Search Acquisitions

November 18, 2011

One of my two or three readers sent me a link to “Acquisition: The Elephant in the Meeting Room.” I don’t have strong feelings one way or the other about Mongoose, the write up, or the enterprise search sector. I have identified some of the buzzwords used to dance around the little-discussed problem of lousy enterprise search systems. If you want to catch up on the obfuscation in which marketers and “real” consultants are entangled, you may find “Search Silver Bullets, Elixirs, and Magic Potions: Thinking about Findability in 2012” a thought starter.

The main point of the Elephant article, it seems to me, is summarized in this passage:

Should you be wary of acquisitions? Not as much as you might read in the blogs and professional communities.

The write up mentions a number of high profile acquisitions and provides some color for the reasons behind the deals. My view of some of the recent deals is different from the Mongoose write up. I suppose that at age 67, I have been watching and participating in the sale of large and small companies. I learned in my work at Booz, Allen & Hamilton before it became an azure chip firm, that the reasons for a corporate action are often difficult to discern from the outside looking in.

The table below provides a run down of my personal take on why certain deals took place.

Read more

Funnelback 11 Released With New and Improved Features

November 12, 2011

Funnelback , a website and enterprise search provider, launched version 11 of its product on October 1st of this year. Funnelback 11  is available on Windows and Linux and also as a cloud service and has an automated tuning engine and search-driven SEO assistant capabilities.

Funnelback 11 also has new features like updatable indexes, efficient crawling, 64-bit indexing and a new high performance search interface.

According to the Funnelback news release “Funnelback 11 Launched with Automated Tuning and SEO Assistant”;  Managing Director Brett Matson said of the product:

“Funnelback 11 has the ability to continually and automatically optimise its ranking using a correct answer set determined by the customer. This enables customers to intuitively adjust the search engine ranking algorithm to ensure it continuously adapts and is optimised to the ever-changing characteristics of their own information environment. A related benefit is that it exposes how effectively the search engine is ranking,”

Regardless of the high praise that Funnelback is giving itself, our take on Funnelback 11, and this release in particular, is that its an annoying display content and that maybe they are trying a bit too hard to impress.

Jasmine Ashton, November 12, 2011

Search Silver Bullets, Elixirs, and Magic Potions: Thinking about Findability in 2012

November 10, 2011

I feel expansive today (November 9, 2011), generous even. My left eye seems to be working at 70 percent capacity. No babies are screaming in the airport waiting area. In fact, I am sitting in a not too sticky seat, enjoying the announcements about keeping pets in their cage and reporting suspicious packages to law enforcement by dialing 250.

I wonder if the mother who left a pink and white plastic bag with a small bunny and box of animal crackers is evil. Much in today’s society is crazy marketing hype and fear mongering.

Whilst thinking about pets in cages and animal crackers which may be laced with rat poison, and plump, fabric bunnies, my thoughts turned to the notion of instant fixes for horribly broken search and content processing systems.

I think it was the association of the failure of societal systems that determined passengers at the gate would allow a pet to run wild or that a stuffed bunny was a threat. My thoughts jumped to the world of search, its crazy marketing pitches, and the satraps who have promoted themselves to “expert in search.” I wanted to capture these ideas, conforming to the precepts of the About section of this free blog. Did I say, “Free.”

A happy quack to http://www.alchemywebsite.com/amcl_astronomical_material02.html for this image of the 21st century azure chip consultant, a self appointed expert in search with a degree in English and a minor in home economics with an emphasis on finger sandwiches.

The Silver Bullets, Garlic Balls, and Eyes of Newts

First, let me list the instant fixes, the silver bullets,  the magic potions, the faerie dust, and the alchemy which makes “enterprise search” work today. Fasten your alchemist’s robe, lift your chin, and grab your paper cone. I may rain on your magic potion. Here are 14 magic fixes for a lousy search system. Oh, one more caveat. I am not picking on any one company or approach. The key to this essay is the collection of pixie dust, not a single firm’s blend of baloney, owl feathers, and goat horn.

  1. Analytics (The kind equations some of us wrangled and struggled with in Statistics 101 or the more complex predictive methods which, if you know how to make the numerical recipes work, will get you a job at Palantir, Recorded FutureSAS, or one of the other purveyors of wisdom based on big data number crunching)
  2. Cloud (Most companies in the magic elixir business invoke the cloud. Not even Macbeth’s witches do as good  a job with the incantation of Hadoop the Loop as Cloudera,but there are many contenders in this pixie concoction. Amazon comes to mind but A9 gives me a headache when I use A9 to locate a book for my trusty e Reeder.)
  3. Clustering (Which I associate with Clustify and Vivisimo, but Vivisimo has morphed clustering in “information optimization” and gets a happy quack for this leap)
  4. Connectors (One can search unless one can acquire content. I like the Palantir approach which triggered some push back but I find the morphing of ISYS Search Software a useful touchstone in this potion category)
  5. Discovery systems (My associative thought process offers up Clearwell Systems and Recommind. I like Recommind, however, because it is so similar to Autonomy’s method and it has been the pivot for the company’s flip flow from law firms to enterprise search and back to eDiscovery in the last 12 or 18 months)
  6. Federation (I like the approach of Deep Web Technologies and for the record, the company does not position its method as a magical solution, but some federating vendors do so I will mention this concept. Yhink mash up and data fusion too)
  7. Natural language processing (My candidate for NLP wonder worker is Oracle which acquired InQuira. InQuira is  a success story because it was formed from the components of two antecedent search companies, pitched NLP for customer support,and got acquired by Oracle. Happy stakeholders all.)
  8. Metatagging (Many candidates here. I nominate the Microsoft SharePoint technology as the silver bullet candidate. SharePoint search offers almost flawless implementation of finding a document by virtue of  knowing who wrote it, when, and what file type it is. Amazing. A first of sorts because the method has spawned third party solutions from Austria to t he United States.)
  9. Open source (Hands down I think about IBM. From Content Analytics to the wild and crazy Watson, IBM has open source tattooed over large expanses of its corporate hide. Free? Did I mention free? Think again. IBM did not hit $100 billion in revenue by giving software away.)
  10. Relationship maps (I have to go with the Inxight Software solution. Not only was the live map an inspiration to every business intelligence and social network analysis vendor it was cool to drag objects around. Now Inxight is part of Business Objects which is part of SAP, which is an interesting company occupied with reinventing itself and ignored TREX, a search engine)
  11. Semantics (I have to mention Google as the poster child for making software know what content is about. I stand by my praise of Ramanathan Guha’s programmable search engine and the somewhat complementary work of Dr. Alon Halevy, both happy Googlers as far as I know. Did I mention that Google has oodles of semantic methods, but the focus is on selling ads and Pandas, which are somewhat related.)
  12. Sentiment analysis (the winner in the sentiment analysis sector is up for grabs. In terms of reinventing and repositioning, I want to acknowledge Attensity. But when it comes to making lemonade from lemons, check out Lexalytics (now a unit of Infonics). I like the Newssift case, but that is not included in my free blog posts and information about this modest multi-vehicle accident on the UK information highway is harder and harder to find. Alas.)
  13. Taxonomies (I am a traditionalist, so I quite like the pioneering work of Access Innovations. But firms run by individuals who are not experts in controlled vocabularies, machine assisted indexing, and ANSI compliance have captured the attention of the azure chip, home economics, and self appointed expert crowd. Access innovations knows its stuff. Some of the boot camp crowd, maybe somewhat less? I read a blog post recently that said librarians are not necessary when one creates an enterprise taxonomy. My how interesting. When we did the ABI/INFORM and Business Dateline controlled vocabularies we used “real” experts and quite a few librarians with experience conceptualizing, developing, refining, and ensuring logical consistency of our word lists. It worked because even the shadow of the original ABI/INFORM still uses some of our term 30 plus years later. There are so many taxonomy vendors, I will not attempt to highlight others. Even Microsoft signed on with Cognition Technologies to beef up its methods.)
  14. XML (there are Google and MarkLogic again. XML is now a genuine silver bullet. I thought it was a markup language. Well, not any more, pal.)

Read more

Mindbreeze: A View from the Top

November 9, 2011

Fabasoft Mindbreeze managing director, Daniel Fallman, gives his insight to KM World in, “Mindbreeze, Managing Director, Daniel Fallmann: View from the Top.”

Using open standards, Mindbreeze offers high-performance enterprise search and digital cognition for all kinds of enterprises. We have developed context-enriching indexing services, which are available without time-consuming set up procedures. Information access without ironclad security is not a solution. Fabasoft Mindbreeze ensures that only authorized users can access the information. Our product was designed from the beginning to be installed quickly in minutes, thus obviating expensive installation processes. The Fabasoft Mindbreeze Appliance can be up and running for your users in just a matter of hours.

Fallmann, the Fabasoft Mindbreeze founder, talks about his Austrian start-up on this brief video. He is able to succinctly explain how the Mindbreeze solution assists users with internal and external search.

Saving the user from lengthy installation and clunky customization, Mindbreeze seamlessly integrates onto an existing platform. Semantic recognition enhances search results, providing not only quick but relevant search results. Third-party application data is available to mobile devices through Fabasoft Mindbreeze Mobile. Standard installations such as Microsoft SharePoint can lack versatility and customization becomes lengthy and difficult.

Evaluate your enterprise needs and see if Fabasoft Mindbreeze and its highly efficient solutions might be the right choice for your organization. In Fallmann’s words, “Make informed decisions.”

Emily Rae Aldridge, November 9, 2011

Sponsored by Pandia.com

Business Process Management: Bit Player or Buzz Word?

November 7, 2011

I spoke with one of the goslings who produces content for our different information services. We were reviewing a draft of a write up, and I reacted negatively to the source document and to the wild and crazy notions that find their way into the discussions about “problems” and “challenges” in information technology.

In enterprise search and content management, flag waving is more important than solving customers’ problems. Economic pressure seems to exponentiate the marketing clutter. Are companies with resources “too big to flail””? Nope.

Here’s the draft, and I have put in bold face the parts that caught my attention and push back:

As the amount of data within a business or industry grows the question of what to do with it arises.  The article, “Business Process Management and Mastering Data in the Enterprise“, on Capgemini’s Web site explains how Business Process Management (BPM) is not the ideal means for managing data.

According the article as more and more operations are used to store data the process of synchronizing the data becomes increasingly difficult.

As for using BPM to do the job, the article explains,

While BPM tools have the infrastructure to do hold a data model and integrate to multiple core systems, the process of mastering the data can become complex and, as the program expands across ever more systems, the challenges can become unmanageable. In my view, BPMS solutions with a few exceptions are not the right place to be managing core data[i]. At the enterprise level MDM solutions are for more elegant solutions designed specifically for this purpose.

The answer to this ever-growing problem was happened upon by combining knowledge from both a data perspective and a process perspective.  The article suggests that a Target Operating Model (TOM) would act as a rudder for the projects aimed at synchronizing data.  After that was in place a common information model be created with enterprise definitions of the data entities which then would be populated by general attributes fed by a single process project.

While this is just one man’s answer to the problem of data, it is a start. Regardless of how businesses approach the problem it remains constant–process management alone is not efficient enough to meet the demands of data management.

Here’s my concern. First, I think there are a number of concepts, shibboleths, and smoke screens flying, floating, and flapping. The conceptual clutter is crazy. The “real” journalists dutifully cover these “signals”. My hunch is that most of the folks who like videos gobble these pronouncements like Centrum multivitamins. The idea is that one doze with lots of “stuff” will prevent information technology problems from wrecking havoc on an organization.

Three observations:

First, I think that in the noise, quite interesting and very useful approaches to enterprise information management can get lost. Two good examples. Polyspot in France and Digital Reasoning in the U.S. Both companies have approaches which solve some tough problems. Polyspot offers and infrastructure, search, and apps approach. Digital Reasoning delivers next-generation numerical recipes, what the company calls entity based analytics. Baloney like Target Operating Models do not embrace these quite useful technologies.

Second, the sensitivity of indexes and blogs to public relations spam is increasing. The perception that indexing systems are “objective” is fascinating, just incorrect. What happens then is that a well heeled firm can output a sequence of spam news releases and then sit back and watch the “real” journalists pick up the arguments and ideas. I wrote about one example of this in “A Coming Dust Up between Oracle and MarkLogic?

Third, I am considering a longer essai about the problem of confusing Barbara, Desdemona’s mother’s maid, with Othello. Examples include confusing technical methods or standards with magic potions; for instance, taxonomies as a “fix” for lousy findability and search, semantics as a work around for poorly written information, metatagging as a solution to context free messages, etc. What’s happening is that a supporting character, probably added by the compilers of Shakespeare’s First Folio edition is made into the protagonist. Since many recent college graduates don’t know much about Othello, talking about Barbara as the possible name of the man who played the role in the 17th century is a waste of time. The response I get when I mention “Barbara” when discussing the play is, “Who?” This problem is surfacing in discussions of technology. XML, for example, is not a rabbit from a hat. XML is a way to describe the rabbit-hat-magician content and slice and dice the rabbit-hat-magician without too many sliding panels and dim lights.

What is the relation of this management and method malarkey? Sales, gentle reader, sales. Hyperbole, spam, and jargon are Teflon to get a deal.

Stephen E Arnold, November 7, 2011

Sponsored by Pandia.com

Spotlight: Mindbreeze on the SharePoint Stage

November 1, 2011

A new feature, mentioned in the Beyond Search story “Software and Smart Content.” We will be taking a close look at some vendors. Some will be off the board; for example, systems which have been acquired and, for all practical purposes, their feature set frozen. I have enlisted Abe Lederman, one of the founders of Verity (now a unit of Autonomy and Hewlett Packard)  and now the chief executive of Deep Web Technologies.

Our first company under the spotlight is Mindbreeze, which is a unit of Fabasoft, which is one of the leading, if not the leading, Microsoft partners in Austria. Based in Linz, Mindbreeze offers are remarkably robust search and content processing solution.

The company is a leader in adding functionality to basic search, finding, and indexing tasks in organizations worldwide. In August 2011, CMSWire’s “A Strategic Look at SharePoint: Economics, Information & People” made this point:

SharePoint continues to grow in organizations of all sizes, from document collaboration and intranet publishing, to an increasing focus on business process workflows, internet and extranets. Today, many organizations are now in flight with their 2010 upgrades, replacing other portals and ECM applications, and even embracing social computing all on SharePoint.

The Mindbreeze system, according to Daniel Fallmann, the individual who was the mastermind behind the Mindbreeze technology, “snaps in” to Microsoft SharePoint and addresses many of the challenges that a SharePoint administrator encounters when trying to respond to diverse user needs in search and retrieval. In as little as a few hours, maybe a day, a company struggling to locate information in a SharePoint installation can be finding documents using a friendly, graphical interface.

My recollection of Mindbreeze is that it was a “multi stage” service oriented architecture. For me, this means that system administrators can configure the system from a central administrative console and work through the graphical set up screens to handle content crawling (acquisition), indexing, and querying.

The system supports mobile search and can support “apps,” which are quickly becoming the preferred method of accessing certain types of reports. The idea is that a Mindbreeze user from sales can access the content needed prior to a sales call from a mobile device.

According to Andreas Fritschi, a government official at Canton Thurgau:

Fabasoft Mindbreeze Enterprise makes our everyday work much easier. This is also an advantage for our citizens. They receive their information much faster. This software can be used by people in all sectors of public administration, from handling enquiries to people in management.

Why is the tight integration with Microsoft SharePoint important? There are three reasons that our work in search and content processing highlights.

First, there are more than 100 million SharePoint installations and most of the Fortune 1000 are using SharePoint to provide employees with content management, collaboration, and specialized search-centric functions such as locating a person with a particular area of knowledge in one’s organization. With Mindbreeze, these functions become easier to use and require no custom coding to implement within a SharePoint environment.

Second, users are demanding answers, not laundry lists. The Mindbreeze approach allows a licensee to set up the system to deliver exactly with a group of users or a single user requires. The tailoring occurs within the Fabasoft and Mindbreeze “composite content environment.” Fabasoft and Mindbreeze deliver easy-to-use configuration tools. Mash ups are a few clicks away.

Third, Mindbreeze makes use of the Fabasoft work flow technology. Information can be moved from Point A to Point B without requiring changing users’ work behaviors. As a result, user satisfaction rises.

You can learn more about Mindbreeze at www.mindbreeze.com. Information about Fabasoft and its technology are at www.fabasoft.com.

Stephen E Arnold, November 1, 2011

Sponsored by Pandia.com

The Perils of Searching in a Hurry

November 1, 2011

I read the Computerworld story “How Google Was Tripped Up by a Bad Search.” I assume that it is pretty close to events as the “real” reporter summarized them.

Let me say that I am not too concerned about the fact that Google was caught in a search trip wire. I am concerned with a larger issue, and one that is quite important as search becomes indexing, facets, knowledge, prediction, and apps. The case reported by Computerworld applies to much of “finding” information today.

Legal matters are rich with examples of big outfits fumbling a procedure or making an error under the pressure of litigation or even contemplating litigation. The Computerworld story describes an email which may be interpreted as having a bright LED to shine on the Java in Android matter. I found this sentence fascinating:

Lindholm’s computer saved nine drafts of the email while he was writing it, Google explained in court filings. Only to the last draft did he add the words “Attorney Work Product,” and only on the version that was sent did he fill out the “to” field, with the names of Rubin and Google in-house attorney Ben Lee.

Ah, the issue of versioning. How many content management experts have ignored this issue in the enterprise. When search systems index, does one want every version indexed or just the “real” version? Oh, what is the “real” version. A person has to investigate and then make a decision. Software and azure chip consultants, governance and content management experts, and busy MBAs and contractors are often too busy to perform this work. Grunt work, I believe, it may be described by some.

What I am considering is the confluence of people who assume “search” works, the lack of time Outlook and iCalandar “priority one” people face, and the reluctance to sit down and work through documents in a thorough manner. This is part of the “problem” with search and software is not going to resolve the problem quickly, if ever.

Source: http://www.clipartguide.com/_pages/0511-1010-0617-4419.html

What struck me is how people in a hurry, assumptions about search, and legal procedures underscore a number of problems in findability. But the key paragraph in the write up, in my opinion, was:

It’s unclear exactly how the email drafts slipped through the net, and Google and two of its law firms did not reply to requests for comment. In a court filing, Google’s lawyers said their “electronic scanning tools” — which basically perform a search function — failed to catch the documents before they were produced, because the “to” field was blank and Lindholm hadn’t yet added the words “attorney work product.” But documents produced for opposing counsel should normally be reviewed by a person before they go out the door, said Caitlin Murphy, a senior product manager at AccessData, which makes e-discovery tools, and a former attorney herself. It’s a time-consuming process, she said, but it was “a big mistake” for the email to have slipped through.

What did I think when I read this?

First, all the baloney—yep, the right word, folks–about search, facets, metadata, indexing, clustering, governance and analytics underscore something I have been saying for a long, long time. Search is not working as lots of people assume it does. You can substitute “eDiscovery,” “text mining,” or “metatagging” for search. The statement holds water for each.

The algorithms will work within limits but the problem with search has to do with language. Software, no matter how sophisticated, gets fooled with missing data elements, versions, and words themselves. It is high time that the people yapping about how wonderful automated systems are stop and ask themselves this question, “Do I want to go to jail because I assumed a search or content processing system was working?” I know my answer.

Second, in the Computerworld write up, the user’s system dutifully saved multiple versions of the document. Okay, SharePoint lovers, here’s a question for you? Does your search system make clear which antecedent version is which and which document is the best and final version? We know from the Computerworld write up that the Google system did not make this distinction. My point is that the nifty sounding yap about how “findable” a document is remains mostly baloney. Azure chip consultants and investment banks can convince themselves and the widows from whom money is derived that a new search system works wonderfully. I think the version issue makes clear that most search and content processing systems still have problems with multiple instances of documents. Don’t believe me. Go look for the drafts of your last PowerPoint. Now to whom did you email a copy? From whom did you get inputs? Which set of slides were the ones on the laptop you used for the briefing? What the “correct” version of the presentation? If you cannot answer the question, how will software?

Read more

Software and Smart Content

October 30, 2011

I was moving data from Point A to Point B yesterday, filtering junk that has marginal value. I scanned a news story from a Web site which covers information technology with a Canadian perspective. The story was “IBM, Yahoo turn to Montreal’s NStein to Test Search Tool.” In 2006, IBM was a pace-setter in search development cost control The company was relying on the open source community’s Lucene technology, not the wild and crazy innovations from Almaden and other IBM research facilities. Web Fountain and jazzy XML methods were promising ways to make dumb content smart, but IBM needed a way to deliver the bread-and-butter findability at a sustainable, acceptable cost. The result was OmniFind. I had made a note to myself that we tested the Yahoo OmniFind edition when it became available and noted:

Installation was fine on the IBM server. Indexing seemed sluggish. Basic search functions generated a laundry list of documents. Ho hum.

Maybe this comment was unfair, but five years ago, there were arguably better search and retrieval systems. I was in the midst of the third edition of the Enterprise Search Report, long since batardized by the azure chip crowd and the “real” experts. But we had a test corpus, lots of hardware, and an interest is seeing for ourselves how tough it was to get an enterprise search system up and running. Our impression was that most people would slam in the system, skip the fancy stuff, and move on to more interesting things such as playing Foosball.

Thanks to Adobe for making software that creates a need for Photoshop training. Source: http://www.practical-photoshop.com/PS2/pages/assign.html

Smart, Intelligent… Information?

In this blast from the past article, NStein’s product in 2006 was “an intelligent content management product used by media companies such as Time Magazine and the BBC, and a text mining tool called NServer.” The idea was to use search plus a value adding system to improve the enterprise user’s search experience.

Now the use of the word “intelligent” to describe a content processing system, reaching back through the decades to computer aided logistics and forward to the Extensible Markup Language methods.

The idea of “intelligent” is a pregnant one, with a gestation period measured in decades.

Flash forward to the present. IBM markets OmniFind and a range of products which provide basic search as a utility function. NStein is a unit of OpenText, and it has been absorbed into a conglomerate with a number of search systems. The investment needed to update, enhance, and extend BASIS, BRS Search, NStein, and the other systems OpenText “sells” is a big number. “Intelligent content” has not been an OpenText buzzword for a couple of years.

The torch has been passed to conference organizers and a company called Thoora, which “combines aggregation, curation, and search for personalized news streams.” You can get some basic information in the TechCrunch article “Thoora Releases Intelligent Content Discovery Engine to the Public.”

In two separate teleconference calls last week (October 24 to 28, 2011), “intelligent content” came up. In one call, the firm was explaining that traditional indexing system missed important nuances. By processing a wide range of content and querying a proprietary index of the content, the information derived from the content would be more findable. When a document was accessed, the content was “intelligent”; that is, the document contained value added indexing.

The second call focused on the importance of analytics. The content processing system would ingest a wide range of unstructured data, identify items of interest such as the name of a company, and use advanced analytics to make relationships and other important facets of the content visible. The documents were decomposed into components, and each of the components was “smart”. Again the idea is that the fact or component of information was related to the original document and to the processed corpus of information.

No problem.

Shift in Search

We are witnessing another one of those abrupt shifts in enterprise search. Here’s my working hypothesis. (If you harbor a life long love of marketing baloney, quit reading because I am gunning for this pressure point.)

Let’s face it. Enterprise search is just not revving the engines of the people in information technology or the chief financial officer’s office. Money pumped into search typically generates a large number of user complaints, security issues, and cost spikes. As content volume goes up, so do costs. The enterprise is not Google-land, and money is limited. The content is quite complex, and who wants to try and crack 1990s technology against the nut of 21st century data flows. Not I. So something hotter is needed.

Second, the hottest trends in “search” have nothing to do with search whatsoever. Examples range from conflating the interface with precision and recall. Sorry. Does not compute for me. The other angle is “mobile.” Sure, search will work  when everything is monitored and “smart” software provides a statistically appropriate method suggests will work “most” of the time. There is also the baloney about apps, which is little more than the gameification of what in many cases might better be served with a system that makes the user confront actual data, not an abstraction of data. What this means is that people are looking for a way to provide information access without having to grunt around in the messy innards of editorial policies, precision, recall, and other tasks that are intellectually rigorous in a way that Angry Birds interfaces for business intelligence are not.

Third, companies engaged in content access are struggling for revenue. Sure, the best of the search vendors have been purchased by larger technology companies. These acquisitions guarantee three things.

  1. The Wild West spirit of the innovative content processing vendors is essentially going to be stamped out. Creativity will be herded into the corporate killing pens, and the “team” will be rendered as meat products for a technology McDonald’s
  2. The cash sink holes that search vendors research programs were will be filled with procedure manuals and forms. There is no money for blue sky problem solving to crack the tough problems in information retrieval at a Fortune 1000 company. Cash can be better spent on things that may actually generate a return. After all, if the search vendors were so smart, why did most companies hit revenue ceilings and have to turn to acquisitions to generate growth? For firms unable to grow revenues, some just fiddled the books. Others had to get injections of cash like a senior citizen in the last six months of life in a care facility. So acquired companies are not likely to be hot beds of innovation.
  3. The pricing mechanisms which search vendors have so cleverly hidden, obfuscated, and complexified will be tossed out the window. When a technology is a utility, then giant corporations will incorporate some of the technology in other products to make a sale.

What we have, therefore, is a search marketplace where the most visible and arguably successful companies have been acquired. The companies still in the marketplace now have to market like the Dickens and figure out how to cope with free open source solutions and giant acquirers who will just give away search technology.

Read more

Access Innovations Awarded Patent for MAIChem

October 28, 2011

Bravo to our friends at Access Innovations for receiving a U.S. patent (the company’s 19th technology patent) for MAIChem, a software-based method for searching chemical names in documents.

The company, founded in 1978, focuses on Internet technology applications and content management and enhancement. MAIChem is a tool that will be highly useful for researchers and information managers in the chemical and pharmacy data industries. A press release, “Access Innovations Receives U.S. Patent for Unique MAIChem™ Software Search Method: Software Provides Fast, In-Depth, Broad and Consistently Accurate Searches of Chemical and Pharmaceutical Industry Data,” shares details about the tool:

Finding these names in documents is challenging due to the unlimited number of potential compounds and the variety of ways a compound can be named. MAIChem solves the problem by comparing the text to regular expressions that match typical chemical morphemes, such as “hydro” or “amine,” to see if they occur in words.”’explained Marjorie M.K. Hlava, president of Access Innovations. After its initial analysis, MAIChem’s software differentiates between nonchemical words that use the morphemes and actual chemical names.

MAIChem could potentially help in numerous fields and tasks: content discovery, analysis, machine-aided indexing, and faster information retrieval. The award of this patent shows Access Innovations is bringing something unique to the board in content management. Chemistry professionals should be swooning; Access Innovations is taking it to the next level. Congratulations from the team at Beyond Search.

For more information about Access Innovations’ MAIChem, visit http://www.dataharmony.com/products/maichem.html Now maybe the faux taxonomy experts will realize there is more to ANSI standard vocabularies than a slick marketing program and a reference to military training. We can only hope.

Andrea Hayden, October 28, 2011

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta