A Wrinkle in the Government Procurement Envelope
March 3, 2009
Government agencies buy quite a bit of hardware, storage, and systems to deal with digital information. I avoid Washington, DC. I went to grade school there. I fought traffic on I 270 when I worked in the city for a decade after getting booted from a third tier university. I then did the SDF to BWI run on Southwest for five or six years when I was hooked up with a government-centric services firm. I don’t know that much about procurements, but I do know when what looks like a trivial event could signal a larger shift. You can take a look at the ComputerWorld story “DOJ Accuses EMC of Improper Pricing” here. If I were writing the headline, I would have slapped in an “allegedly”. Keep in mind I am reporting second hand news and offering a comment. I am not sure how accurate or how much oomph this DOJ (Department of Justice) matter has. The thrust of the story is that DOJ is sniffing into payments and tie ups. Now most folks in Harrods Creek, Kentucky don’t pay much attention to the nuances of Federal acquisition regulations. Let’s assume that this is little more than a clerical error. But in my opinion this single matter signals a tougher line on how companies that manufacture or create hardware and software deal with the government. Some organizations sell direct to the government and others take the lead and turn it over to partners. The relationships among the manufacturers and the partners and the government is a wonderland of interesting activities. Why is this important? Search vendors operate in different ways and some systems trigger significant hardware acquisitions. With a massive Federal deficit, I wonder, “Is this single alleged action a harbinger of closer scrutiny of some very high profile companies’ business dealings?” My hunch is, “Yep.” Some companies will want to tidy their business processes. When rocks get flipped over, some interesting things can be spotted. One major search vendor does not sell directly to the US government. The vendor deals through partners. Some partners are loved more than others. My thought is that if I were investigating these tie ups, I would prefer to see partners treated in an equitable way with documentation that backs up the compensation, limits, and responsibilities with regard to the US government and the source of the hardware or software. If the system is “informal”, I would dig a little deeper to make sure that US government procurement guidelines were followed to the letter. Just my opinion. I might come out of retirement to do some of the old time procurement fact finding when spring comes.
Stephen Arnold
Mysteries of Online 9: Time
March 3, 2009
Electronic information has an interesting property: time distortion. The distortion has a significant effect on how users of electronic information participate in various knowledge processes. Information carries humans along much as a stream whisks a twig in the direction of the flow. Information, unlike water, moves in multiple directions, often colliding, sometimes reinforcing, and at others in paradoxical ways that leave a knowledge worked dazed, confused, and conflicted. The analogy of information as a tidal wave connotes only a partial truth. Waves come and go. Information flow for many people and systems is constant. Calm is tough to locate.
Vector fields. Source: http://www.theverymany.net/uploaded_images/070110_VectorField_test012_a-789439.jpg
In the good old days of cuneiform tablets, writing down the amount of wheat Eknar owed the king required specific steps. First, you had to have access to suitable clay, water, and a clay kneading specialist. Second, you needed to have a stylus of wood, bone, or maybe the fibula of an enemy removed in a timely manner. Third, you had to have your data ducks in a row. Dallying meant that the clay tablet would harden and make life more miserable than it already was. Once the document was created, the sun or kiln had to cooperate. Once the clay tablet was firm enough to handle without deleting a mark for a specified amount of wheat, the tablet was stacked in a pile inside a hut. Forth, the access the information, the knowledge worker had to locate the correct hut, find the right pile, and then inspect the tablets without breaking one, a potentially bad move if the king had a short temper or needed money for a war or a new wife.
In the scriptorium in the 9th century, information flow wasn’t much better. The clay tablets had been replaced with organic materials like plant matter or for really important documents, the scraped skin of sheep. Keep in mind that other animals were used. Yep, human skin worked too. Again time intensive processes were required to create the material on which a person would copy or scribe information. The cost of the materials made it possible to get patrons to spit out additional money to illustrate or illuminate the pages. Literacy was not widespread in the 9th century and there were a number of incentives to get sufficient person power to convert foul papers to fair copies and then to compendia. Not just anyone could afford a book. Buying a book or similar document did not mean the owner could read. The time required to produce hand copies was somewhat better than the clay tablet method or the chiseled inscriptions or brass castings used by various monarchs.
Yep, I will have it done in 11 months, our special rush service.
With the invention of printing in Europe, the world rediscovered what the Chinese had known for 800, maybe a thousand years. No matter. The time required to create information remained the same. What changed was that once a master set of printing plates had been created. A printer with enough capital to buy paper (cheaper than the skin and more long lasting than untreated plant fiber and less ink hungry than linen based materials) could manufacture multiple copies of a manuscript. The out of work scribes had to find a new future, but the impact of printing was significant. Everyone knows about the benefits of literacy, books, and knowledge. What’s overlooked is that the existence of books altered the time required to move information from point A to point B. Once time barriers fell, distance compressed as well. The world became smaller if one were educated. Ideas migrated. Information moved around and had impact, which I discussed in another Mysteries of Online essay. Revolutions followed after a couple hundred years, but the mindless history classes usually ignore the impact of information on time.
If we flash forward to the telegraph, time accelerated. Information no longer required a horse back ride, walk, or train ride from New York to Baltimore to close a real estate transaction. Once the new fangled electricity fell in love with information, the speed of information increased with each new innovation. In fact, more change in information speed has occurred since the telegraph than in previous human history. The telephone gave birth to the modem. The modem morphed into a wireless USB 727 device along with other gizmos that make possible real time information creation and distribution.
Time Earns Money
I dug out notes I made to myself sometime in the 1982 – 1983 time period. The implications of time and electronic information caught my attention for one reason. I noted that the revenue derived from a database with weekly updates was roughly 30 percent greater than information derived from the same database on a monthly update cycle. So, four updates yielded a $1.30, not $1.00. I wrote down, “Daily updates will generate an equal or greater increase.” I did not believe that the increase was infinite. The rough math I did 25 years ago suggested that with daily updates the database would yield about 1.6 percent more revenue than the same database with a monthly update cycle. In 1982 it was difficult to update a commercial database more than once a day. The cost of data transmission and service charges would gobble up the extra money, leaving none for my bonus.
In the financial information world, speed and churn are mutually reinforcing. New information makes it possible to generate commissions.
Time, therefore, not only accelerated the flow of information. Time could accelerate earnings from online information. Simply by u9pdating a database, the database would generate more money. Update the database less frequently, the database would generate less money. Time had value to the users.
I found this an interesting learning, and I jotted it down in my notebook. Each of the commercial database in which I played a role were designed for daily updates and later multiple updates throughout the day. To this day, the Web log in which this old information appears is updated on a daily basis and several times a week, it is updated multiple times during the day. Each update carries and explicit time stamp. This is not for you, gentle and patient reader. The time stamp is for me. I want to know when I had an idea. Time marks are important as the speed of information increases.
Implications
The implications of my probably third-hand insight included:
- The speed up in dissemination means that information impact is broader, wider, and deeper with each acceleration.
- Going faster translates to value for some users who are willing and eager to pay for speed. The idea is that knowing something (anything) first is an advantage.
- Fast is not enough. Customers addicted to information speed want to know what’s coming. The inclusion of predictive data adds another layer of value to online services.
- Individuals who understand the value of information speed have a difficult time understanding why more online systems and services cannot deliver what is needed; that is, data about what will happen with a probability attached to the prediction. Knowing that something has a 70 chance of taking place is useful in information sensitive contexts.
Let me close with one example of the problem speed presents. The Federal government has a number of specialized information systems for law enforcement and criminal justice professionals. These systems have some powerful, albeit complex, functions. The problem is that when a violation or crime occurs, the law enforcement professionals have to act quickly. The longer the reaction time, the greater the chance that the bad egg will tougher to apprehend increases. Delay is harmful. The systems, however, require that an individual enter a query, retrieve information, process it and then use another two or three systems in order to get the reasonably complete picture of the available information related to the matter under investigation.
The systems have a bottleneck. The human. Law enforcement personnel, on the other hand, have to move quickly. As a result, the fancy online systems operate in one time environment and the law enforcement professionals operate in another. The opportunity to create systems that bring both time universes together is significant. Giving a law enforcement team mobile comms for real time talk is good, but without the same speedy and fluid access to the data in the larger information systems, the time problem becomes a barrier.
Opportunity in online and search, therefore, is significant. Vendors who pitch another fancy search algorithm are missing the train in law enforcement, financial services, competitive intelligence, and medical research. Going fast is no longer a way to add value. Merging different time frameworks is a more interesting area to me.
Stephen Arnold, February 26, 2009
Information Must Be Destroyed
February 25, 2009
Another ComputerWorld story caught my attention. This one is the work of Ben Rothke. Either he or his editor came up with an interesting headline: “Why Information Must Be Destroyed” here. When I hear about a Draconian action that could endanger my cybergoose life, I twitch my feathers. Partly in fear. Partly in annoyance. If we live in an information age, nuking zeros and ones strikes me as an interesting notion. I was not aware that information could be destroyed. Once I know something under this mandate, the way to get rid of the information is to put this goose in the roaster. Not a happy thought for the goose in my opinion.
Mr. Rothke, like other fun loving ComputerWorld scribes, wrote about deleting data in an organization. He acknowledged that there are rules and regulations about information. He noted that paying a fine for destroying information might be preferable to a perp walk. After I worked my way through the seven part article, I reread my notes; to wit:
- I think that organizations need to obtain legal counsel about what to keep and what methods to follow
- The notion of mixing records management, eDiscovery, and mandated information retention may benefit from a requirements analysis
- In a bad economy, litigation can be troublesome. Government inquiries can be troublesome. Disgruntled employees releasing information can be troublesome.
A concerned manager will want to use the ComputerWorld write up as a thought starter, not a road map. Destroy information? Well, you need to make sure you get it all and button up anyone who knows the information. There are many ways to qualify for a perp walk. The ComputerWorld article sidesteps some of the more interesting facets of destroying information. The task is easier said than done in my experience.
Stephen Arnold, February 25, 2009
Autonomy Encomium
February 23, 2009
If you love Autonomy, you will delight in “Autonomy Continues the Path to eDiscovery with Conceptual Search.” The story appeared in CMSWire here. The write up follows a familiar and entertaining path. The news was so good that Morningstar documented here that Autonomy’s CFO Sushovan Hussain snapped up 85,000 shares of Autonomy stock in February 2009. For this deal, Mr. Hussain sold some shares and turned around and bought more. Life seems to be good for Autonomy as its competitors paddle harder, Autonomy sails on the winds of success.
Stephen Arnold, February 23, 2009
Guidance That Leads Astray
February 16, 2009
I find this story somewhat difficult to believe. In fact, if it were not distributed by the estimable Yahoo here, I would have ignored the write up. The core of the story is that a firm providing eDiscovery systems, software, and services to law firms and corporate legal departments mishandled its own eDiscovery process. The introductory paragraphs of the Yahoo story seem like a television treatment for a Law and Order rerun:
Guidance Software Inc. bills itself as the leading provider of technology that helps companies dig up old e-mails and other electronic documents that might be evidence in a lawsuit. Yet when Guidance itself had to face a judge, it was accused of bumbling its internal digital search. Whether Guidance intentionally hid documents or just couldn’t find them is a matter of dispute. The company said it did all that was required. But its inability to cough up certain e-mails, even over several months, led an arbitrator to accuse it of gross negligence and proceeding in bad faith.
I don’t quote from Associated Press stories. Their legal eagles frighten 65 year old geese here in Harrod’s Creek. If you have the courage, you can read the Associated Press’s version of this story here. Keep in mind that I don’t know if this is accurate or an elaborate spoof. But I quite fancy the award graphic on the Guidance Web site:
If you want more information about the company, Guidance Software, Inc., click here. If you are looking for an eDiscovery vendor, you might want to double check your short list. I can suggest one outfit that would not make me comfortable if this remarkable Yahoo News story turned out to be accurate. I am on the fence on this use of eDiscovery. Orange jump suit territory if an eDiscovery company could not perform eDiscovery.
Stephen Arnold, February 16, 2009
Great Bit Faultline: IT and Legal Eagles
February 6, 2009
The legal conference LegalTech generates quite a bit of information and disinformation about search, content processing, and text mining. Vendors with attorneys on the marketing and sales staff are often more cautious in their wording even though these professionals are not the school president type personalities some vendors prefer. Other vendors are “all sales all the time” and this crowd surfs the trend waves.
You will have to decide whose news release to believe. I read an interesting story in Centre Daily Times here called “Continuing Disconnect between IT and Legal Greatly Hindering eDiscovery Efforts, Recommind Survey Finds”. The article makes a point for which I have only anecdotal information; namely, information technology wizards know little about the eDiscovery game. IT wonks want to keep systems running, restore files, and prevent users from mucking up the enterprise systems. eDiscovery on the other hand wants to pour through data, suck it into a system that prevents spoliation (a fancy word for delete or change documents), and create a purpose built system that attorneys can use to fight for truth, justice, and the American way.
Now, Recommind, one of the many firms claiming leadership in the eDiscovery space, reports the results of a survey. (Without access to the sample selection method and details of the analytic tools, the questionnaire itself, and the folks who did the analysis I’m flying blind.) The article asserts:
Recommind’s survey demonstrates that there is significant work remaining to achieve this goal: only 37% of respondents reported that legal and IT are working more closely together than a year before. This issue is compounded by the fact that only 21% of IT respondents felt that eDiscovery was a “very high” priority, in stark contrast with the overwhelming importance attached to eDiscovery by corporate legal departments. Furthermore, there remains a significant disconnect between corporate accountability and project responsibility, with legal “owning” accountability for eDiscovery (73% of respondents), records management (47%) and data retention (50%), in spite of the fact that the IT department actually makes the technology buying decisions for projects supporting these areas 72% of the time. Exacerbating these problems is an alarming shortage of technical specifications for eDiscovery-related projects. Only 29% of respondents felt that IT truly understood the technical requirements of eDiscovery. The legal department fared even worse, with only 12% of respondents indicating that legal understood the requirements. Not surprisingly, this disconnect is leading to a lack of confidence in eDiscovery project implementation, with only 27% of respondents saying IT is very helpful during eDiscovery projects, and even fewer (16%) believing legal is.
My reaction to these alleged findings was, “Well, makes sense.” You will need to decide for yourself. My hunch is that IT and legal departments are a little like the Hatfields and the McCoys. No one knows what the problem is, but there is a problem.
What I find interesting is that enterprise search and content processing systems are generally inappropriate for the rigors of eDiscovery and other types of legal work. What’s amusing is a search vendor trying to sell to a lawyer who has just been surprised in a legal action. The lawyer has some specific needs, and most enterprise search systems don’t meet these. Equally entertaining is a purpose built legal system being repackaged as a general purpose enterprise search system. That’s a hoot as well.
As the economy continues its drift into the financial Bermuda Triangle, I think everyone involved in legal matters will become more, not less, testy. Stratify, for example, began life as Purple Yogi and an intelligence-centric tool. Now Stratify is a more narrowly defined system with a clutch of legal functions. Does an IT department understand a Stratify? Nope. Does an IT department understand a general purpose search system like Lucene. Nope. Generalists have a tough time understanding the specific methods of experts who require a point solution.
In short, I think the numbers in the Recommind study may be subject to questions, but the overall findings seem to be generally on target.,
Stephen Arnold, February 6, 2009
Certified Search: Who Was First
February 5, 2009
I chuckled when I read “Autonomy Introduces Industry First Search Process Validation Module to Ensure Defensible Search” here. The story asserts:
Autonomy Corporation plc (LSE: AU. or AU.L), a global leader in infrastructure software for the enterprise, today unveiled the industry’s most advanced, forensically sound search module, Search Process Validation (SPV).
You can get more information about Autonomy here. The reason for my giggle? There some folks who have been “certifying” search results for a while. Check out Iron Mountain’s Stratify and Clearwell Systems.
Stephen Arnold, February 5, 2009
Lexalytics’ Jeff Caitlin on Sentiment and Semantics
February 3, 2009
Editor’s Note: Lexalytics is one of the companies that is closely identified with analyzing text for sentiment. When a flow of email contains a negative message, Lexalytics’ system can flag that email. In addition, the company can generate data that provides insight into how people “feel” about a company or product. I am simplifying, of course. Sentiment analysis has emerged as a key content processing function, and like other language-centric tasks, the methods are of increasing interest.
Jeff Caitlin will speak at what has emerged as the “must attend” search and content processing conference in 2009. The Infonortics’ Boston Search Engine meeting features speakers who have an impact on sophisticated search, information processing, and text analytics. Other conferences respond to public relations; the Infonortics’ conference emphasizes substance.
If you want to attend, keep in mind that attendance at the Boston Search Engine Meeting is limited. To get more information about the program, visit the Infonortics Ltd. Web site at www.infonortics.com or click here.
The exclusive interview with Jeff Caitlin took place on February 2, 2009. Here is the text of the interview conducted by Harry Collier, managing director of Infonortics and the individual who created this content-centric conference more than a decade ago. Beyond Search has articles about Lexalytics here and here.
Will you describe briefly your company and its search / content processing technology?
Lexalytics is a Text Analytics company that is best known for our ability to measure the sentiment or tone of content. We plug in on the content processing side of the house, and take unstructured content and extract interesting and useful metadata that applications like Search Engines can use to improve the search experience. The types of metadata typically extracted include: Entities, Concepts, Sentiment, Summaries and Relationships (Person to Company for example).
With search / content processing decades old, what have been the principal barriers to resolving these challenges in the past?
The simple fact that machines aren’t smart like people and don’t actually “understand” the content it is processing… or at least it hasn’t to date. The new generation of text processing systems have advanced grammatic parsers that are allowing us to tackle some of the nasty problems that have stymied us in the past. One such example is Anaphora resolution, sometimes referred to as “pronominal preference”, which is a bunch of big confusing sounding words to explain the understanding of “pronouns”. If you took the sentence, “John Smith is a great guy, so great that he’s my kids godfather and one of the nicest people I’ve ever met.” For people this is a pretty simple sentence to parse and understand, but for a machine this has given us fits for decades. Now with grammatic parsers we understand that “John Smith” and “he” are the same person, and we also understand who the speaker is and what the subject is in this sentence. This enhanced level of understanding is going to improve the accuracy of text parsing and allow for a much deeper analysis of the relationships in the mountains of data we create every day.
What is your approach to problem solving in search and content processing? Do you focus on smarter software, better content processing, improved interfaces, or some other specific area?
Lexalytics is definitely on the better content processing side of the house, our belief is that you can only go so far by improving the search engine… eventually you’re going to have to make the data better to improve the search experience. This is 180 degrees apart from Google who focus exclusively on the search algorithms. This works well for Google in the web search world where you have billions of documents at your disposal, but hasn’t worked as well in the corporate world where finding information isn’t nearly as important as finding the right information and helping users understand why it’s important and who understands it. Our belief is that metadata extraction is one of the best ways to learn the “who” and “why” of content so that enterprise search applications can really improve the efficiency and understanding of their users.
With the rapid change in the business climate, how will the increasing financial pressure on information technology affect search / content processing?
For Lexalytics the adverse business climate has altered the mix of our customers, but to date has not affected the growth in our business (Q1 2009 should be our best ever). What has clearly changed is the mix of customers investing in Search and Content Processing, we typically run about 2/3 small companies and 1/3 large companies. In this environment we are seeing a significant uptick in large companies looking to invest as they seek to increase their productivity. At the same time, we’re seeing a significant drop in the number of smaller companies looking to spend on Text Analytics and Search. The Net-Net of this is that if anything Search appears to be one of the areas that will do well in this climate, because data volumes are going up and staff sizes are going down.
Microsoft acquired Fast Search & Transfer. SAS acquired Teragram. Autonomy acquired Interwoven and Zantaz. In your opinion, will this consolidation create opportunities or shut doors. What options are available to vendors / researchers in this merger-filled environment?
As one of the vendors that works closely with 2 of the 3 the major Enterprise Search vendors we see these acquisitions as a good thing. FAST for example seems to be a well-run organization under Microsoft, and they seem to be very clear on what they do and what they don’t do. This makes it much easier for both partners and smaller vendors to differentiate their products and services from all the larger players. As an example, we are seeing a significant uptick in leads coming directly from the Enterprise Search vendors that are looking to us for help in providing sentiment/tone measurement for their customers. Though these mergers have been good for us, I suspect that won’t be the case for all vendors. We work with the enterprise search companies rather than against them, if you compete with them this may make it even harder to be considered.
As you look forward, what are some new features / issues that you think will become more important in 2009? Where do you see a major break-through over the next 36 months?
The biggest change is going to be the move away from entities that are explicitly stated within a document to a more ‘fluffy’ approach. Whilst this encompasses things like inferring directly stated relationships – “Joe works at Big Company Inc” – is a directly stated relationship it also encompasses being able to infer this information from a less direct statement. “Joe, got in his car and drove, like he did everyday his job at Big Company Inc.” It also covers things like processing of reviews and understanding that sound quality is a feature of an iPod from the context of the document, rather than having a specific list. It also encompasses things of a more semantic nature. Such as understanding that a document talking about Congress is also talking about Government, even though Government might not be explicitly stated.
Graphical interfaces and portals (now called composite applications) are making a comeback. Semantic technology can make point and click interfaces more useful. What other uses of semantic technology do you see gaining significance in 2009? What semantic considerations do you bring to your product and research activities?
One of the key uses of semantic understanding in the future will be in understanding what people are asking or complaining about in content. It’s one thing to measure the sentiment for an item that you’re interested in (say it’s a digital camera), but it’s quite another to understand the items that people are complaining about while reviewing a camera and noting that the “the battery life sucks”. We believe that joining the subject of a discussion to the tone for that discussion will be one of the key advancements in semantic understanding that takes place in the next couple of years.
Where can I find out more about your products, services and research?
Lexalytics can be found on the web at www.lexalytics.com. Our Web log discusses our thoughts on the industry: www.lexalytics.com/lexablog. A downloadable trial is available here. We also have prepared a white paper, and you can get a copy here.
Harry Collier, February 3, 2009
Autonomy Scores PR Coup
February 1, 2009
London’s newspapers are darned entertaining. I immediately turn to page 3 of any paper I find on the tube when I am in the UK. I don’t buy the papers, though. The Times of London ran a beefy article about Autonomy. You can read the story here. The idea is that Autonomy is “at the heart of a new data revolution.” I had a difficult time following the write up. One comment stuck in my mind:
“Information is going to change from fitting to what a computer needs to a computer fitting to how we do stuff,” Lynch said. “If you believe that the world is going to move over to unstructured data you would expect to see that electricity inside almost every piece of software.”
I think I understand. Electricity will be “inside almost every piece of software.” I don’t know if I can forget this phrase.
Stephen Arnold, February 1, 2009
Facebook and Twitter: Who Owns What
January 30, 2009
If a Facebook or Twitter fails, what happens? What a silly question. According to Jeremy Liew, Facebook is “pretty comfortable” about where the company is “right now”. You can find this statement and quite a bit of useful commentary in the article “Warning: Dependence on Facebook, Twitter Could Be Hazardous to Your Business” here. For me the most important comment in the write up by Mark Glaser was:
If you are planning on using either Twitter or Facebook as a marketing platform for yourself or your business, be sure to read the Terms of Service carefully. That’s what Facebook’s Larry Yu advised when I talked to him. “The important thing for people to do is to review the Terms of Service,” he said. “A lot of people don’t do that. They don’t have experience with it, and we encourage people to do it…There are also terms for application developers. As people decide to develop on the platform, they have to be comfortable with those terms.”
This addled goose is wary of social networks. Some trophy generation denizens believe that they don’t exist unless providing information on these publishing platforms. The trophy kids want to “hook up” and keep their “friends” informed about their activities and where abouts. When one of the trophy kids becomes a person of interest to law enforcement, those social postings are going to be somewhat useful to certain authorities. I wonder if the trophy kids realize that some information which is innocuous at the time it becomes available might provide insights to a more informed thinker. Run a query for profiling and see what you think of that discipline. Finally, there’s a nifty tool called the Analyst’s Notebook. If you are not familiar with it, run a Google query for that puppy. From my point of view the information “in” social systems is fascinating. Technology is an interesting construct. The consequences of technology can be even more interesting. Think search, content processing, link analysis, clustering, and other useful methods crunching on those social data. Yum, yum.
Stephen Arnold, January 31, 2009