Text Analytics SummitPolySpot: Agile Enterprise Search Infrastructure

Censorship Inputs: Filtering Content and Unintended Consequences

January 29, 2012

I find “inputs” annoying. An “input” is advice, a comment delivered in parental mode, or suggestions which are more about the person making the suggestion than the person receiving the suggestion. Twitter is getting “inputs” about the alleged filtering of tweets in certain countries. (Keep in mind that search engines filter on a routine basis.)

No tweets needed in this woodcut of the 1844 Nativist riot in Philadelphia. Social media just accelerates information flow. A happy quack to Wikipedia.

A good example is “Letter to Twitter Executive Chairman Jack Dorsey Urging Him Not to Cooperate with Censors.” The idea is a simple one—When asked to filter content, Twitter should ignore the request. But what happens when the request is made by a governmental entity? Does Twitter ignore that governmental request. This type of blow off sounds great sitting in a college dorm at 3 am talking about what is right and wrong. The problem is that it ignores three salient facts top most in the minds of governmental executives around the world:

  1. Social media is the mechanism for starting and sustaining revolt. Even the Googler involved in Egypt’s transformation pointed the finger at Facebook. Facebook’s executives were half a world away and probably not thinking about the system as a mechanism for revolt.
  2. Governments are behind the curve when it comes to technology. As a result, governments and officials with power want to stop the technology in its tracks. The idea is that if a service is a problem, one can make the problem go away. That’s why India, China, and other outfits want to clamp down hard on certain content channels or at least be able to pry them open and take action if warranted.
  3. The companies want to keep earning money and keep their executives out of jail or out of harm’s way. Most of folks providing inputs don’t know what could and may happen to a frisky executive who ignores a request from a nation state. In case you don’t know, the actions range from jail time, death, harassment, and multiple actions across financial and personal spheres of behavior. This is hard ball, kids, and you need to know that nation states act lawfully within their borders and have the same extra-nation state options that the US, England, Israel, and other countries do.

Here’s an example of the sort of input which can lead to some interesting situations:

We are very disturbed by this decision, which is nothing other than local level censorship carried out in cooperation with local authorities and in accordance with local legislation, which often violates international free speech standards. Twitter’s position that freedom of expression is interpreted differently from country to country is inacceptable. This fundamental principle is enshrined in the Universal Declaration of Human Rights. We call on you to be transparent about the way you propose to carry out this censorship. Posting the removal requests you receive from governments on the Chilling Effects website will not suffice to offset the harm done by denying access to content. Twitter has said that, if it receives “a valid and properly scoped request from an authorized entity,” it may respond by withholding access to certain content in a particular country, while notifying the content’s author.

I heard that one nation state turned force on a crowd of protestors. See “Chinese Troops Seal Off Tibetan Protest Region.” Quite a spicy filter in my opinion. This is the real world and the social media which is touted as replacing search as the next big thing is fostering some interesting unintended consequences; namely, forcing governments to embrace tougher behaviors. I generally worked for governments and law enforcement. As a result, I am making an observation based on experience. There are two types of force: hard and soft. Filtering with software is about as soft as force gets. The hard force, on the other hand, is not something most readers of this blog want to experience as a receiver of input with intent.

Remember: I am okay with a person making inputs. I am not okay with the assumption that a commercial enterprise is going to be able to do the college dorm version of the “right thing.” Missing a class is one thing. Getting arrested, killed, or becoming the focus of a disinformation attack is another.

Finding is one thing. Inciting is quite another. Lowest common denominator, consumerization, commoditization—describe it as you will. There are interactions in the real world that don’t exist in a philosophical discussion among soon to be unemployable students.

Stephen E Arnold, January 29, 2012

Sponsored by Pandia.com

Googzilla Gets Social

January 11, 2012

I scanned the “official” line of Google’s most recent social play. I flipped through the long list of comments, views, opinions, etc. My reaction? What’s the big surprise. Here’s an anchor post: “Antitrust+,” which appeared in Parislemon. The main idea seems to be that pundits recognize Google, an outfit I called Googzilla back in 2005, is doing the beaver thing. (The notion of Googzilla originated from my research which revealed that Google believed that its “system” would provide the underpinnings for most business processes. Therefore, search was the new infrastructure. When I used this reference in a talk in London, the Googler on the panel with me said, “Cool.” Googzilla is just a big beaver, doing its beaver thing.) You may recall the adage, “Beavers do what beavers do.” Put the beaver in the kitchen of the Cast Iron Grill in Harrod’s Creek, Kentucky, and the beaver starts building a dam. Why? That’s what beavers do. Easy to predict because beavers do their thing. Here’s evidence of the Google-beaver similarity:

Google is using Search to propel their social network. They might say it’s “not a social network, it’s a part of Google”, but no one is going to buy that. They were late to the game in social and this is the best catch-up strategy ever. Given that it’s opt-out, I’m just not sure that this is all that different from Microsoft bundling IE with Windows.

Google is doing the social thing, not because Google is social. Google is doing social in order to remain relevant to the Facebook, Twitter, LinkedIn users. In these systems, content from humans is perceived to be more accurate, less biased, and generally more useful than a list of results in which ads, content, red herrings, and even malware lurk. Hey, some users seem to think, the social information is just “better.” When the user is looking for a short cut, getting mis- or dis-information from a “friend” is probably a better bet than taking what a non-social system generates.

Beavers do what beavers do. Why does one expect the beaver to build a computer when beavers build dams.

My view is that most of the free content available on the Web is dicey stuff. Most users today—including recent library school graduates—lack the skills to determine accurate content in most topic areas, distorted content  with bent or shaped “facts”, content with mixed semantic or sentiment coloring, and the most relevant document for a particular query.

In short, “beavers do what beavers do” applies to Google, but the adage also applies to users who take what systems give them because advertisers and other funding sources foot the bill. Ask yourself these questions:

  1. When I am looking for information, I consult multiple commercial databases, review a representative selection of the documents, and make judgments about which documents warrant further investigation?
  2. When consuming results from any free online system, do I routinely verify facts by looking for another source which can verify the data in which I have an interest?
  3. When accepting “hits” from predictive systems, I run the same query on another predictive system and evaluate the outputs?

I know from information gathered as recently as last week, that even among recent library school graduates that few, if any, perform these actions.

So Google is getting social because:

  1. Facebook and other “real” competitors are nibbling into Google’s revenue growth system. In 2006, Google had essentially zero competitors. Today, Google is in an uncomfortable position. Amazon, Apple, Facebook, and even the once presumed terminal Microsoft are posing problems, big problems. Google’s management is responding with “me too” solutions in the hopes that sheer imitation will solve the competitive gap problem. The beaver is doing what the beaver does.
  2. Google’s gravity free run is now carrying the ballast of staff retention. With the big paydays coming to employees of pre-IPO companies, 13 year old outfits don’t have that old hiring magnetism any longer. As a result, Google cannot innovate and disrupt. Google is now in the imitate and disrupt mode in my opinion. Aging beavers do what aging beavers do; that is, look for short cuts.
  3. Google must push through increasing friction. The resistance is coming from regulators who can be “managed” but that takes time, mental resources, and effort. No problem but with legal hassles on every continent except Antarctica, Google finds the legal tar getting harder. Other factors bumping up the coefficient of friction at Google are the cut backs, the about faces, and the multi-front product and service wars the company is fighting. Even beavers grow careless. I saw a squashed on on the way to the post office yesterday.

Wow, I bet everyone using social media for information wishes that the traditional method of research were back in vogue. Online services reflect the user. In short, beavers do what beavers do, and today beavers don’t do “get your hands dirty” research. How inefficient! Let’s get social to find the “truth”. That works?

I find Google interesting and one can make its public search system deliver high value results. However, most online users just accept what the system outputs. When I was younger, I worried that commercial online services like Dialog and LexisNexis would manipulate results to suit their corporate purposes. As risky as placing trust in a commercial online service may be, Dialog and LexisNexis made no effort to filter the content generated by commercial database producers. In fact, the systems made it possible to run a query across multiple commercial files using the 411 command or to run comprehensive searches across a corpus of third party content. It took time and effort to grind through these outputs, but the effort would yield insights, suggestions for further research, and often make visible unintentional or factual errors. In our Business Dateline database, we went so far as to include post publication corrections to the full text article. The idea was to make it clear that even commercial publishers make mistakes, often really big ones.

Today, the online consumer is getting exactly what the online consumer wants. The content finding systems are not built to deliver accurate, unbiased results. The majority of online users want answers, not the time consuming, intellectually exhausting task of figuring out the provenance and accuracy of information. Who wants to do library research and mind numbing data analysis. I want the equivalent of ESPN Newscenter so I “know” what happened in sports. Who has time to watch the games? Why read “long form” content when one can snag information via Flipbook and Pulse?

So let’s knock off the worry about Google and its incursions into social. Put that effort into performing rigorous searching. When the users shift from taking spoon fed, baby food content to more substantive fare, then Google as well as other online services will adapt.

Perhaps this type of sign should be posted on search result pages from ad supported online research services? Image source: http://www.graphicshunt.com/funny/images/stupidity-13135.htm

Right now, Google is doing what beavers do. Users are doing what users do. Hard work, fact based analysis, and exercising judgment are not driving online. Distraction, ease of use, easy, fast, and fun information access is driving beavers into a frenzy.

Beavers do what beavers do. One can’t change Mother Nature. Complaining about Googzilla is pretty much a waste of energy which can be better spent with more rigorous research. Wow, that will be popular with today’s “average” user looking for pizza in all the wrong places.

Stephen E Arnold, January 11, 2012

Sponsored by Pandia.com, a Web site run by information professionals

SAP: Long and Winding Road for Search

January 5, 2012

In one of the early editions of the Enterprise Search Report, that white elephant of 600 pages containing profiles of more than two dozen vendors, I described TREX, a nifty algorithm for Text Retrieval and Information Extraction. (The link is to the Wikipedia write up, however.) For those of you who are new to search, TREX is not the creature you wished you had as a pet when you were eight years old. The SAP TREX is a natural language processing search and retrieval system which was mostly home grown. Keep in mind that TREX owns the Inxight entity extraction and server technology developed by the adepts at Xerox PARC. I interviewed one of the developers, profiled the system’s approach to content processing, and pointed out that search was a killer in the SAP R/3 environment for three reasons:

  1. SAP assigns its own spiffy metadata to content objects, storing these in the wild and wonder proprietary R/3 environment
  2. SAP systems took and probably still take a long time to plan, implement, and impose on the client. My understanding is that the client does not tell SAP how the clients like to work. SAP tells the client how the client will work with the SAP system and method. Nifty for sure.
  3. SAP systems have struggled with a wide range of performance “opportunities.” The idea is that when something goes slowly, then the client has the “opportunity” to make changes which will speed up the large, IBM-inspired system.

A few years ago, before Endeca became the new billion dollar toy at Oracle, Endeca accepted cash infusions from outfits hooked up with Intel (yep, the company with the vision that its chips could crush any computational problem because they were so darned fast) and SAP’s investment unit (an outfit allegedly looking at ways to give SAP a leg up on the future). After watching Endeca do its recursive indexing and faceting processes, Intel and SAP shifted gears. Endeca, as you know, is now part of Oracle along with TripleHop (clustering and indexing), InQuira (natural language processing from two predecessor companies), and RightNow (also infused with search technology), Artificial Linguistics, PL/SQL’s wonky command driven search, and probably some technologies I either don’t know about or have forgotten due to advancing senility.

Will SAP slip and fall with its information retrieval solutions? A happy quack to the image source http://personalinjuryclaims1.co.uk/fall-claims/

When you want to run search within an SAP environment, many folks just embrace one of the SharePoint solutions, give TREX a go, or license a system which is compatible with some of the SAP processed content. In short, SAP’s approach to search is not much different from IBM’s or Microsoft’s.

The question to consider is, “What’s next for SAP?”

Several observations:

First, SAP has to pump money into TREX to keep the system in step with today’s information demands. With SAP dabbling in open source and focusing on higher margin products and services, TREX is probably not the long haul solution for SAP. Home grown search is too expensive.

Second, SAP continues to poke around open source software. At some point, SAP may follow in the footsteps of the company which inspired SAP in the first place—IBM. Lucene and Solr look like possible options. This is a trend to watch.

Third, SAP buys or ties up with one of the workman-like search vendors. SAP could either sign a deal to use a third party system on some basis or just buy one of the dozens of information retrieval vendors who are looking for a financial white knight. Despite the chatter about search, many search and retrieval companies are gasping for oxygen. SAP may have a tank and a breathing mask.

What’s my view? Well, since I am a mercenary goose, I don’t have an official opinion. I do find it fascinating that SAP has not moved aggressively to the Lucene Solr solution. So for now, I am going out of town and will wait until my Overflight service provides some solid data about SAP’s next move.

Hopefully it will be more artfully crafted than SAP’s pricing and customer service activities in the last two or three years.

Stephen E Arnold,

January 5, 2012

Sponsored by Pandia.com

Open Access Threatened by Elsevier Backed Legislation

January 3, 2012

Academic publishing, specifically in the fields of science and math, is a big money industry. The whole system hinges on containing the flow of information, a task that grows increasingly difficult with the demand for free access to information. Free access is fueled by the internet and social media, with these influences creating a new generation of young people who assume and demand that information be free. Arxiv.org is an open access archive for academic literature devoted to math and science. It and other open access portals are being threatened by potential legislation. (Open access is a term referring to quality information sources that are not protected by a subscription.) The Quantum Pontiff tells us more in, “Could Elsevier Shut Down Arxiv.org?”

The blogger reports:

They (Elsevier) haven’t yet, but they are supporting SOPA, a bill that attempts to roll back Web 2.0 by making it easy to shut down entire sites like Wikipedia and Craigslist if they contain any user-submitted infringing material.

image

Splash page of arxiv.org shows the seal of Cornell University and the phrase “We gratefully acknowledge supporting institutions. See http://arxiv.org/

Social media and copyright are inherently opposing concepts. User-submitted material, as it is referred to above, will almost always infringe upon copyright. In fact, very few submissions aside from the users own thoughts and words will not infringe upon copyright. If the legislators supporting SOPA (Stop Online Piracy Act) make good on all their promises, eventual showdowns with social media heavy hitters like Facebook or YouTube could occur.

American copyright was established by the founding fathers in our constitution to balance the protection of intellectual property with the ability to foster creativity and innovation. However, copyright has evolved in the modern era into a blanket protection policy, primarily serving corporations. Libraries and other institutions of learning champion the cause of open access, but even these civic organizations are threatened by corporate lobbyists in their constant quest to have copyright protection extended tighter and longer.

Read more

Predicting Failure: Pot Calls Kettle Black and Blue

January 2, 2012

Fascinating is traditional media’s ability to attack a hopelessly confused big corporation for a failure. The failure documented by the New York Times was Hewlett Packard’s immolation of its mobile strategy. The outfit doing the criticizing—what I call the pot calling the kettle gray lady black and blue—is the New York Times. Ah, irony.

Which is more flawed? The management of HP or the management of the New York Times. Let me try to remember. The New York Times lost its top manager and its head of digital stuff. The home delivery rate is nudging close to $700 a year. The Safari loophole makes its digital content free. The company has muffed the bunny with its indexing, its About.com property, and just about every financial knob and dial setting available.

HP, on the other hand, has engaged in improper behavior, the CEO revolving door game, the tablet fiasco, and the open sourcing of a $1.0 billion plus investment. HP bought Autonomy for $10 billion, creating a mini cash concern for some Wall Street types.

Sounds like a pretty even game of management

Now to the business at hand: “In Flop of H.P. TouchPad, an Object Lesson for the Tech Sector.” (If the link goes dead, just use Safari. Access to NYT content seems to be “free”. Nifty, eh? What is the New York Times suggesting? For me, the write up is more about the New York Times itself than about Hewlett Packard. Three points:

  1. HP created a flop due to various management mistakes. Okay, sounds like the NYT’s problem
  2. HP had a good idea but it “was ahead of its time”. Right. The NYT had a deal with LexisNexis which worked pretty well, but not well enough. So the NYT decided it could go it alone. It was, as the NYT says, “ahead of its time.” No kidding.
  3. HP faced a problem with newcomers who dominated a market. Check. Same with the NYT and its various digital efforts. Being good at one thing does not mean that one if good at another thing.

My take? The NYT is trying to be just like the Harvard Business Review, adding value to what is not even a news story any longer. Going down this path ignores some of the basics of creating high value business and management analysis. The information is not what makes money. It is the other revenue streams. The NYT will learn as Time and Newsweek have that trying to up one’s intellectual game does not automatically make the money flow or the analysis insightful. Business information is often a loss leader or a way to generate consulting revenue.

The write up does explain how the NYT sees the woes of other companies. That is indeed interesting. I wonder if the NYT team remembers its original online search service. I bet Jeff Pemberton does.

Stephen E Arnold, January 2, 2012

Sponsored by Pandia.com

Big Data Analytics and Sense Making with Synthesys

December 19, 2011

Tim Estes is the CEO and co-founder of Digital Reasoning. Digital Reasoning develops and markets solutions that provide Automated Understanding for Big Data.

There’s a great deal of talk about “big data” today. If you walk into an AT&T store near you, you may see the statistics of users sending over 3 Billion text messages a day or over 250 million tweets. Compare that to closer to 100 million or less tweets a day a year or two ago, and it’s daunting how rapidly the volume of digital information is increasing. A mobile phone without expandable storage frustrates users who want to keep a contacts list, rich media, and apps in their pocket. In organizations, the appetite for storage is significant. EMC, Hewlett Packard, and IBM are experiencing strong demand for their storage systems. Cloud vendors such as Amazon and Rackspace are also experiencing strong demand from companies offering compelling services to end users on their infrastructure. At a recent Amazon conference in Washington, Werner Vogels revealed that the AWS Cloud has hundreds of thousands of companies/customers running on it as some level. Finally, companies like Digital Reasoning are working the next generation of Cloud – automated understanding – that goes from a focus on infrastructure to sense-making of data that sits in hosted or private clouds.

While most of the attention has been on infrastructure like virtualization / hypervisors, Hadoop, and NoSQL data storage systems, we think those are really the enablers of the killer app for Cloud- which is making sense of data to solve information overload. Without next generation analytics and supporting technology, it is essentially impossible to:

  • Analyze a flow of data from multiple sensors deployed in a factory
  • Process mobile traffic at a telephone company
  • Make sense of unstructured and structured information flowing through an email system
  • Identify key entities and their importance in a stream of financial news and transaction data.

These are the real world problems that have engaged me for many years. I founded Digital Reasoning to automatically make sense of data because I believed that someday all software would learn and that would unleash the next great revolution in the Information Age. The demand for this revolution is inevitable because while data has increased exponentially, human attention has been essentially static in comparison. Technology to create better return on attention would go from “nice to have” to utterly essential. And now, that moment is here.

Digging a little deeper, Digital Reasoning has created a way to take human communication and use algorithms to make sense of it without having to depend on a human design, an ontology, or some other structure. Our system looks at patterns and the way a word is used in its context and bootstraps the understanding much like a human child does – creating associations and building into more complex relationships.

In 2009, we migrated onto Hadoop and began taking on the problem of managing very large scale unstructured data and move the industry beyond counting things that are well structured and toward being able to figure out exactly what the data means that you are measuring.

Digital Reasoning asks the question: “How do you take loose, noisy information that is disconnected and unstructured and then make sense of it so that you can then apply analytics to it in a way that is valuable to business?”

We identify actors, actions, patterns, and facts and then put it into the context of space and time in an efficient and scalable way. In the government scenario, that can mean to finding and stopping bad guys. In the legal environment they want to answer the questions of “who”, “what”, “where”, and “when”.

Digital Reasoning initially set our focus on the complex task of making sense out of massive volumes of unstructured text within the US Government Intelligence Community after the events of 9/11. But we also believe that our Synthesys software can be utilized in the commercial sector to create great value from the mountains of unstructured data that sit in the Enterprise and streaming in from the Web.

Companies with large-scale data will see value in investing in our technology because they cannot hire 100,000 people to go through and read all of the available material. This matters if you are a bank and trying to make financial trades. This matters for companies doing electronic discovery. This matters for health sectors that need help organizing medical records and guarding against fraud.

We are an emerging firm, growing rapidly and looking to have the best and the brightest join our quest to empower users and customers to make sense of their data through revolutionary software. With the recent investment from In-Q-Tel and partners of Silver Lake, I believe that Digital Reasoning has a great future ahead. We are on the bleeding edge of what is going on with Hadoop and Big Data in the engineering area and how to make sense of data through some of the most advanced learning algorithms in the world. Most of all we care that people are empowered with technology so that they can recover value and time in the race to overcome information overload.

To learn more about Digital Reasoning, navigate to our Web site and download our white paper.

Tim Estes, December 19, 2011

Sponsored by Pandia.com

The Future of Computing: Forget Search?

December 6, 2011

I opened my dead tree version of the New York Times a few minutes ago. I noticed an insert called “Science Times: The Future of Computing.” You may be able to find the December 6, 2011, story at this link. No promises, however.

I found the collection of articles and essays interesting. I suppose “interesting” is a poor word choice. The collection covers start ups, the Africa meme, quantum computing, artificial intelligence (an oxymoron I have heard), online instruction (bad news for some traditional educational materials’ business models I believe), a “programmable universe” (another notion which would be fun to discuss in Philosophy 101), biocomputing, security, open source, and a look at how computing is so important.

I have zero inputs to these polished, shaped, and New York mid-town write ups. The point of the exercise, I believe, was finding the buttons to push at General Electric to get the two page spread which told me:

We power. We are making energy independence a reality. From cutting edge, think film solar panels to advanced gas turbines, we created the high-tech machines that create over a quarter of the world’s energy…

My reaction to the collection of essays in the “special” section was three fold.

First, search, findability, and information access are not concepts which made the starting team in the articles and essays. In fact, I had a tough time locating the link to the special section itself, but that type of intellectual exercise is not one that concerns most of the traditional publishing companies covering technology. The collection and its inserted advertisement seem to lack an integrating hook. In my world, the notion of integration is a pretty big idea.

Second, the special section lacked a message. After working through the “real” outputs from “real” writers, I wondered what might have been done to string these gems on a necklace. The reader would then have been able to enjoy each gem and marvel at the beauty of the necklace. Someone in that Philosophy 101 class would have offered up gestalt, but not the addled goose. I just know when a collection lacks unity.

Third, is GE the “right” advertiser. I read the ad and asked myself two questions:

  1. Isn’t the solar industry in a bit of a tail spin? Forget Solyndra. There are other economic forces which prevent my neighbors from kicking the gas and traditional electric company approach for solar technology.
  2. The energy point baffled me. I kept wondering who supplied the Fukushima reactors? I mean there were fuel pools to the left and fuel pools to the right. Then there were some fuel rods on the roof, almost out of sight.

Interesting special section. Too bad search did not make the cut. It would have been interesting to read what the public relations firms for Google, Microsoft, and Yandex (Blekko) would have said about the future. I would also have enjoyed a write up by Jon Kleinberg, whose team has found some interesting information in posted Flickr pictures. But with search on the outs in the New York knowledge value world, I will just put my fins in the water and take a paddle around the pond filled with mine run off water. None of that coal has anything to do with certain large firms which produce “over a quarter of the world’s energy.” I will consult a mobile device and run a query. The system will “know” what I want better than I do. Artificial intelligence. Just great. Just not search and retrieval or research. Who needs research?

Stephen E Arnold, December 6, 2011

Sponsored by Pandia.com

Search Acquisitions

November 18, 2011

One of my two or three readers sent me a link to “Acquisition: The Elephant in the Meeting Room.” I don’t have strong feelings one way or the other about Mongoose, the write up, or the enterprise search sector. I have identified some of the buzzwords used to dance around the little-discussed problem of lousy enterprise search systems. If you want to catch up on the obfuscation in which marketers and “real” consultants are entangled, you may find “Search Silver Bullets, Elixirs, and Magic Potions: Thinking about Findability in 2012” a thought starter.

The main point of the Elephant article, it seems to me, is summarized in this passage:

Should you be wary of acquisitions? Not as much as you might read in the blogs and professional communities.

The write up mentions a number of high profile acquisitions and provides some color for the reasons behind the deals. My view of some of the recent deals is different from the Mongoose write up. I suppose that at age 67, I have been watching and participating in the sale of large and small companies. I learned in my work at Booz, Allen & Hamilton before it became an azure chip firm, that the reasons for a corporate action are often difficult to discern from the outside looking in.

The table below provides a run down of my personal take on why certain deals took place.

Read more

Business Process Management: Bit Player or Buzz Word?

November 7, 2011

I spoke with one of the goslings who produces content for our different information services. We were reviewing a draft of a write up, and I reacted negatively to the source document and to the wild and crazy notions that find their way into the discussions about “problems” and “challenges” in information technology.

In enterprise search and content management, flag waving is more important than solving customers’ problems. Economic pressure seems to exponentiate the marketing clutter. Are companies with resources “too big to flail””? Nope.

Here’s the draft, and I have put in bold face the parts that caught my attention and push back:

As the amount of data within a business or industry grows the question of what to do with it arises.  The article, “Business Process Management and Mastering Data in the Enterprise“, on Capgemini’s Web site explains how Business Process Management (BPM) is not the ideal means for managing data.

According the article as more and more operations are used to store data the process of synchronizing the data becomes increasingly difficult.

As for using BPM to do the job, the article explains,

While BPM tools have the infrastructure to do hold a data model and integrate to multiple core systems, the process of mastering the data can become complex and, as the program expands across ever more systems, the challenges can become unmanageable. In my view, BPMS solutions with a few exceptions are not the right place to be managing core data[i]. At the enterprise level MDM solutions are for more elegant solutions designed specifically for this purpose.

The answer to this ever-growing problem was happened upon by combining knowledge from both a data perspective and a process perspective.  The article suggests that a Target Operating Model (TOM) would act as a rudder for the projects aimed at synchronizing data.  After that was in place a common information model be created with enterprise definitions of the data entities which then would be populated by general attributes fed by a single process project.

While this is just one man’s answer to the problem of data, it is a start. Regardless of how businesses approach the problem it remains constant–process management alone is not efficient enough to meet the demands of data management.

Here’s my concern. First, I think there are a number of concepts, shibboleths, and smoke screens flying, floating, and flapping. The conceptual clutter is crazy. The “real” journalists dutifully cover these “signals”. My hunch is that most of the folks who like videos gobble these pronouncements like Centrum multivitamins. The idea is that one doze with lots of “stuff” will prevent information technology problems from wrecking havoc on an organization.

Three observations:

First, I think that in the noise, quite interesting and very useful approaches to enterprise information management can get lost. Two good examples. Polyspot in France and Digital Reasoning in the U.S. Both companies have approaches which solve some tough problems. Polyspot offers and infrastructure, search, and apps approach. Digital Reasoning delivers next-generation numerical recipes, what the company calls entity based analytics. Baloney like Target Operating Models do not embrace these quite useful technologies.

Second, the sensitivity of indexes and blogs to public relations spam is increasing. The perception that indexing systems are “objective” is fascinating, just incorrect. What happens then is that a well heeled firm can output a sequence of spam news releases and then sit back and watch the “real” journalists pick up the arguments and ideas. I wrote about one example of this in “A Coming Dust Up between Oracle and MarkLogic?

Third, I am considering a longer essai about the problem of confusing Barbara, Desdemona’s mother’s maid, with Othello. Examples include confusing technical methods or standards with magic potions; for instance, taxonomies as a “fix” for lousy findability and search, semantics as a work around for poorly written information, metatagging as a solution to context free messages, etc. What’s happening is that a supporting character, probably added by the compilers of Shakespeare’s First Folio edition is made into the protagonist. Since many recent college graduates don’t know much about Othello, talking about Barbara as the possible name of the man who played the role in the 17th century is a waste of time. The response I get when I mention “Barbara” when discussing the play is, “Who?” This problem is surfacing in discussions of technology. XML, for example, is not a rabbit from a hat. XML is a way to describe the rabbit-hat-magician content and slice and dice the rabbit-hat-magician without too many sliding panels and dim lights.

What is the relation of this management and method malarkey? Sales, gentle reader, sales. Hyperbole, spam, and jargon are Teflon to get a deal.

Stephen E Arnold, November 7, 2011

Sponsored by Pandia.com

The Future of Search Not

October 27, 2011

We received an email from one of my one or two readers pointing me to “The Future of Search” by Martin Belam at Enterprise Search Europe. Good points but in my opinion, the functions describe some world which is hostile to search dinosaurs. Maybe the hip crowd is into this particular “expert’s” vision of search. I am not.

In the hyperlinked  write up, the author pointed out three “items” which appear to make clear a topic I find quite unclear. My reaction was that these items do not capture search either of the moment or some “to be” world where content management experts, governance specialists, and “real” journalists look for information. The items described a future that underscores a conceptual problems in thinking about information retrieval.

There was the obligatory reference to UX, Microsoft’s horrible compression of the phrase “user experience.” In my parlance, this is the kindergarten, razzle dazzle interface of video games. Angry Birds is great for someone who needs distraction. For search, UX is an issue. The flashy interface may disguise flawed, incomplete, or manipulated result sets. Eye candy is not information by default. Confusing paint with the mechanical soundness of the vehicle may be a problem for some people.

There was acknowledgment that search is going mobile. What is important about mobile is that the user is pulled into what I call “shortcut land.” Forget the codes that whisk one to a Web page. The notion of predictive search involves algorithms and engineers who determine thresholds for smart software. When systems do the thinking, will the Gen X and Gen Y folks make better decisions? Hard to say, but they will be in a more controlled and monitored decision environment. Happy there?

Finally, the future of search will involve touch. Frankly, I don’t want to search using “touch”. Google has already used its usage data to kill off Boolean logic. Without Boolean there are more opportunities to put ads in front of users who get a bigger, fuzzier result set. I want to craft a query and launch it against a corpus of content that has an editorial policy. I do not want to point at a facet. I want to obtain on point information in a “hands on” manner. I want to paw, not touch.

To sum up, if I read the article correctly, search is not just dead. Search has been forgotten. Even more interesting is that the discussion of search has little to do with the need for a person to locate unbiased information with precision and recall.

If this is the future of search, I want none of it. As one colleague quipped,  “Don’t fail to miss it.”

Done.

Ken Toth, October 27, 2011

Sponsored by Pandia.com

Next Page »

  •  Only search links from this page: