Effective Knowledge Management Requires Enterprise Search
October 18, 2012
The post goes on to elaborate on another study with similar results:
“Not enough for you? Seven years ago, an article ran in NewScientist. It highlights a study done at King’s College London, that showed in today’s business setting, marked by emails, smart phone connections,– the connected 24×7 reality of today, the average IQ of an individual drops by about 10 points. The study went on to conclude, (and this is my favorite part), ‘Even smoking dope has less effect on your ability to concentrate on the task in hand.’”
Knowledge management is obviously powerful, but requires one to step back and consider available options and information. Enterprise search is a key ingredient to knowledge management and Intrafind offers some of best in class best practices for secure searching that offers semantic linking and intelligent tagging.
Andrea Hayden, October 18, 2012
Sponsored by ArnoldIT.com, developer of Augmentext
Content Targeting for Optimum Digital Customer Experiences
October 12, 2012
With so many possible outlets for engaging customers via digital resources, it may be difficult for companies to find the right mix of services for their digital initiatives. In “Eight Areas You’ll Invest in for Great Digital Customer Experiences” on the CRM Blog, we learn about steps brands are taking to deliver the best customer experiences with available digital resources.
A recent Forrester survey of Web content management (WCM) professionals shows that the focus is on mobile content delivery, video streaming, email tools, and content targeting.
The article elaborates on the importance of content targeting for authenticated users:
“WCM vendors have been pushing hard their vision and capabilities to help deliver customized and personalized content using their systems, and many are already providing strong capabilities in this area. For many marketers and content pros, however, the technical capacity of a WCM system to manage and deliver targeted content to customers, prospects, and partners is outstripping marketers’ ability to take advantage of it. This can be complex. You need a plan. You need people responsible for the execution of the plan. It’s an ongoing commitment.”
The challenge of content targeting and authentication is a key business information concern. A critical difference exists between enterprise information used to drive business decisions and Web content targeting that drives ads. A capable vendor, such as Intrafind, can help enterprises invest strategically to meet this challenge. Intrafind’s Topic Finder, for example, automatically filters and manages these kinds of information streams.
Andrea Hayden, October 12, 2012
Sponsored by ArnoldIT.com, developer of Augmentext
Concept Searching Enrolls University of California
September 25, 2012
We learned that the University of California has selected Concept Searching technology to process content, automatically classify content, and provide taxonomy management software to the Office of the President. “University of California, Office of the President Using Concept Searching’s Smart Content Framework™” said:
The University of California, Office of the President is the system wide headquarters of the University of California, managing its fiscal and business operations and supporting the academic and research missions across its campuses, labs and medical centers.
The Office of the President is the system wide headquarters of the University of California, managing its fiscal and business operations and supporting the academic and research missions across its campuses, labs and medical centers.
conceptClassifier for SharePoint has enabled the University of California, Office of the President to realize search improvements in SharePoint 2007 and in the recent deployment of SharePoint 2010. The university has integrated with the Term Store and taken advantage of the full support of managed metadata properties provided by conceptClassifier for SharePoint.
Martin Garland, president of Concept Searching said:
Using the first two building blocks of the Smart Content Framework™, Metadata and Insight, the University of California, Office of the President was able to rapidly deploy enterprise taxonomies and build the framework to improve search outcomes. This adoption of Concept Searching technologies continues to show our platform is an important component for any organization that places high value on content assets.
Concept Searching provides software products that deliver conceptual metadata generation, auto-classification, and powerful taxonomy management from the desktop to the enterprise. Concept Searching, developer of the Smart Content Framework™, provides organizations with a method to mitigate risk, automate processes, manage information, protect privacy, and address compliance issues. This information governance infrastructure framework utilizes a set of technologies that encompasses the entire portfolio of information assets, resulting in increased organizational performance and agility.
Concept Searching asserts that it is the only platform independent statistical metadata generation and classification software company in the world that uses concept extraction and compound term processing to significantly improve access to unstructured information. The Concept Searching Microsoft suite of technologies runs natively in SharePoint 2010, FAST, Windows Server 2008 R2 FCI, and in Microsoft Office applications.
A June 2012 white paper explaining conceptClassifier is available at this link.
Stephen E Arnold, September 25, 2012
Sponsored by Augmentext
Google Autocomplete: Is Smart Help a Hindrance?
September 10, 2012
You may have heard of the deep extraction company Attensity. There is another company in a similar business with the name inTTENSITY. Not the playful misspelling of the common word “intensity.” What happens when a person looking for the company inTTENSITY get when he or she runs a query on Google. Look at what Google’s autocomplete suggestions recommend when I type intten:
The company’s spelling appears along with the less helpful “interstate ten”, “internet explorer ten”, and “internet icon top ten.” If I enter “inten”, I don’t get the company name. No surprise.
Is Google’s autocomplete a help or hindrance? The answer, in my opinion, is it depends on the users and what he or she is seeking.
I just read “Germany’s Former First Lady Sues Google For Defamation Over Autocomplete Suggestions.” According to the write up:
When you search for “Bettina Wulff” on Google, the search engine will happily autocomplete this search with terms like “escort” and “prostitute.” That’s obviously not something you would like to be associated with your name, so the wife of former German president Christian Wulff has now, according to Germany’s Süddeutschen Zeitung, decided to sue Google for defamation. The reason why these terms appear in Google’s autocomplete is that there have been persistent rumors that Wulff worked for an escort service before she met her husband. Wulff categorically denies that this is true.
The article explains that autocomplete has been the target of criticism before. The concluding statement struck me as interesting:
In Japan, a man recently filed a suit against Google after the autocomplete feature started linking his names with a number of crimes he says he wasn’t involved in. A court in Japan then ordered Google to delete these terms from autocomplete. Google also lost a similar suit in Italy in 2011.
I have commented about the interesting situations predictive algorithms can create. I assume that Google’s numerical recipes chug along like a digital and intent-free robot.
More Content Processing Brand Confusion
September 7, 2012
On a call with a so-so investment outfit once spawned from JP Morgan’s empire, the whiz kids on the call with me asked me to name some interesting companies I was monitoring. I spit out two or three. One name created a hiatus. The spiffy young MBA asked me, “Are you tracking a pump company?”
I realized that when one names search and content processing firms, the name of the company and its brand are important. I was referring to an outfit called “Centrifuge”, a firm along with dozens if not hundreds of others in the pursuit of the big data rainbow. The company has an interesting product, and you can read about the firm at www.centrifugesystems.com.
Now the confusion. Google thinks Centrifuge business intelligence is the same as centrifuge coolant sludge systems. Interesting.
There is a pump and valve outfit called Centrifuge at www.centrisys.us. This outfit, it turns out, has a heck of a marketing program. Utilizing YouTube, a search for “centrifuge systems” returns a raft of information timber about viscosity, manganese phosphate, and lead dust slurry.
I have commented on the “findability” problem in the search, analytics, and content processing sector in my various writings and in my few and far between public speaking engagements. My 68 years weigh heavily on me when a 20-something pitches a talk in some place far from Harrod’s Creek, Kentucky.
The semantic difference between analytics and lead dust slurry is obvious to me. To the indexing methods in use at Baidu, Bing, Exalead, Google, Jike, and Yandex—not so much.
How big of a problem is this? You can see that Brainware, Sinequa, Thunderstone, and dozens of other content-centric outfits are conflated with questionable videos, electronic games, and Latin phrases. When looking for these companies and their brands via mobile devices, the findability challenge gets harder, not easier. The constant stream of traditional news releases, isolated blog posts, white papers which are much loved by graduate students in India, and Web collateral miss their intended audiences. I prefer “miss” to the blunt reality of “unread content.”
I am going to start a file in which to track brand confusion and company name erosion. Search, analytics, and content processing vendors should know that preserving the semantic “magnetism” of a word or phrase is important. Surprising it is to me that I can run a query and get links to visual network analytics along side high performance centrifuges. Some watching robots pay close attention to the “centrifuge” concept I assume.
Brand management is important.
Stephen E Arnold, September 7, 2012
Sponsored by Augmentext
Twitter Politics
August 31, 2012
Oh, goody, more predictive silliness. TechNewsWorld informs us, “Twindex Tracks Pols’ Twitter Temperatures.” Clever name, though it does make me think more about window cleaning than about politics. That’s ok; window cleaning is the more engaging subject.
The full name of the metric is the Twitter Political Index, and it tracks tweeters’ daily thoughts about the two presidential candidates. Twitter created the index with the help of Topsy Labs and pollsters at the Mellman Group and North Star Opinion Research. The polling firms helped validate and tune the algorithms. It is Topsy’s job to track tweets for certain terms and compare sentiment on each candidate. So far, the incumbent seems to be well ahead in the Twittersphere.
But how far can we trust the Twindex? Probably not very far. Writer Richard Adhikari observes:
“The Pew Research Center has found that only 15 percent of adults online use Twitter. On a typical day, that figure is only 8 percent. . . .
“Overall, nearly 30 percent of young adults use Twitter, up from 18 percent the previous year. One in five people aged 18 to 24 uses Twitter on a typical day.
“Further, 11 percent of adults aged 25 to 34 use Twitter on a typical day.
“African-Americans are also heavy Twitter users, with 28 percent of them using Twitter overall and 13 percent doing so on a typical day.
“Urban and suburban residents are also significantly more likely to use Twitter than those in rural areas, Pew found.”
So, yeah, statistically Democrats are likely to fare better among Twitter users than Republicans. This index is about as valuable as any political echo chamber—for entertainment only. Personally, I’d rather be washing windows.
Cynthia Murrell, August 31, 2012
Sponsored by ArnoldIT.com, developer of Augmentext
Document Management Is Ripe For eDiscovery
July 18, 2012
If you work in any aspect related to the legal community, you should be aware that eDiscovery generates a great deal of chatter. Like most search and information retrieval functions, progress is erratic.
While eDiscovery, according to the marketers who flock to Legal Tech and other conferences, will save clients and attorneys millions of dollars in the long run, there will still be some associated costs with it. Fees do not magically disappear and eDiscovery will have its own costs that can accrue, even if they may be a tad lower than the regular attorney’s time sheets.
One way to keep costs down is to create a document management policy, so if you are ever taken to court it will reduce the amount of time and money spent in the litigation process. We have mixed feelings about document management. The systems are often problematic because the management guidance and support are inadequate. Software cannot “fix” this type of issue. Marketers, however, suggest software may be up to the task.
JD Supra discusses the importance of a document management plan in “eDiscovery and Document Management.” The legal firm of Warner, Norcross, and Judd wrote a basic strategy guide for JD Supra for people to get started on a document management plan. A plan’s importance is immeasurable:
“With proper document management, you’ll have control over your systems and records when a litigation hold is issued and the eDiscovery process begins, resulting in reduced risk and lower eDiscovery costs. This is imperative because discovery involving electronically stored data — including e-mail, voicemail, calendars, text messages and metadata — is among the most time-consuming and costly phases of any dispute. Ultimately, an effective document management policy is likely to contribute to the best possible outcome of litigation or an investigation.”
The best way to start working on a plan is to outline your purpose and scope—know what you need and want the plan to do. Also specify who will be responsible for each part of the plan—not designating proper authority can leave the entire plan in limbo. Never forget a records retention policy—it is legally require to keep most data for seven years or permanently, but some data can be deleted. Do not pay for data you do not have to keep. Most important of all, provide specific direction for individual tasks, such as scanning, word management, destruction schedule, and observing litigation holds. One last thing, never under estimate the importance of employee training and audit schedules, the latter will sneak up on you before you know it.
If, however, you still are hesitant in drafting a plan can carry some hefty consequences:
- “Outdated and possibly harmful documents might be available and subject to discovery.
- Failure to produce documents in a timely fashion might result in fines and jail time: one large corporation was charged with misleading regulators and not producing evidence in a timely matter and was fined $10 million.
- Destroying documents in violation of federal statutes and regulations may result in fines and jail time: one provision of the Sarbanes-Oxley Act specifies a prison sentence of up to 20 years for someone who knowingly destroys documents with the intent to obstruct a government investigation.”
A document management plan is a tool meant to guide organizations in managing their data, outlining the tasks associated with it, and preparing for eventual audits and litigation procedures. Having a document management plan in place will make the eDiscovery process go quicker, but another way to make the process even faster and more accurate is using litigation support technology and predictive coding, such as provided by Polyspot.
Here at Beyond Search we have a healthy skepticism for automated content processing. Some systems perform quite well in quite specific circumstances. Examples include Digital Reasoning and Ikanow. Other systems are disappointing. Very disappointing. Who are the disappointing vendors? Not in this free blog. Sign up for Honk!, our no holds barred newsletter, and get that opt-in, limited distribution information today.
Whitney Grace, July 18, 2012
Sponsored by Polyspot
Trimming Legal Costs and Jobs: A Predictive Coding Unintended Consequence?
July 17, 2012
Predictive coding and eDiscovery are circling the legal communities gossip rings about what it means for the future of legal costs and jobs. The Huffington Post addresses the topic in “ ‘Lawyerbots’ Offer Attorneys Faster, Cheaper Assistants.” The US court system has made new regulations when it comes to eDiscovery technology and how it can be used in court cases. Lawyer, legal professionals, and even the companies licensing various programmatic content processing systems are struggling to understand the upside and downside of the algorithmic approach to coding. One-way eDiscovery and predictive coding will be used is to cut down on the many, many hours of post-processing some electronic documents. This new technology is being referred to as “lawyerbots.”
Lawyerbots cut through the man-hours like an electric knife, saving time and clients money. Many are optimistic about the changes. But some clients are ambivalent:
“But how will clients feel about a computer doing some of the dirty work, instead of a lawyer or paralegal manually digging through documents? Some could be concerned that a computer is more apt to make an error, or overlook crucial information. In a recent study in the Richmond Journal of Law and Technology, lawyer labor was tested against lawyerbots with predictive coding software. Researchers found “evidence that such technology-assisted processes, while indeed more efficient, can also yield results superior to those of exhaustive manual review.” In basic terms, the computers had the humans licked.”
Faster and more accurate! It is an awesome combination, but the next question to follow is what about jobs? There are several predictions already out there; the article mentions how Mike Lynch of Autonomy believes the legal community will employ fewer people in the future. Others are embracing the new technology pattern and plan to see changes as the older lawyers retire. Here’s one observation:
“Jonathan Askin, the director of Brooklyn Law School’s Brooklyn Law Incubator and Policy Clinic (BLIP)…said, ‘When I look around at my peers, I see 40-year-old lawyers who are still communicating via snail mail and fax machines and telephones and appearing in physical space for negotiations.’ He said he hopes to better merge the legal sector and technology to serve both lawyers and their clients more efficiently.”
We arrive at yet another crossroads: traditional, variable cost ways vs. new, allegedly more easily budgeted approach to content analysis.
As a librarian, I predict, without having to use predictive analytics that eDiscovery will take some legal occupations. Online wrecked havoc in the special library market. However, I am confident that there will still be a need for humans to keep the lawyerbots and maybe the marketers of these systems in check.
After all, software technology is only as smart as humans program it and humans are prone to error. The lawyerbots will also drive down costs, a blessing in this poor economy, and more people will be apt to bring cases to court, increasing demand for lawyers. In order to get to this point, however, there needs to be an established set of standards on how litigation support software can be programmed, how it can be used, and basic requirements for the processes/code. What’s the outlook? Uncertainty and probably one step forward and one step backwards.
Whitney Grace, July 17, 2012
Sponsored by Polyspot
Google and Latent Semantic Indexing: The KnowledgeGraph Play
June 26, 2012
One thing that is always constant is Google changing itself. Not too long ago Google introduced yet another new tool: Knowledge Graph. Business2Community spoke highly about how this new application proves the concept of latent semantic indexing in “Keyword Density is Dead…Enter “Thing Density.” Google’s claim to fame is providing the most relevant search results based on a user’s keywords. Every time they update their algorithm it is to keep relevancy up. The new Knowledge Graph allows users to break down their search by clustering related Web sites and finding what LSI exists between the results. From there the search conducts a secondary search and so on. Google does this to reflect the natural use of human language, i.e. making their products user friendly.
But this change begs an important question:
“What does it mean for me!? Well first and foremost keyword density is dead, I like to consider the new term to be “Concept Density” or to coin Google’s title to this new development “Thing Density.” Which thankfully my High School English teachers would be happy about. They always told us to not use the same term over and over again but to switch it up throughout our papers. Which is a natural and proper style of writing, and we now know this is how Google is approaching it as well.”
The change will means good content and SEO will be rewarded. This does not change the fact, of course, that Google will probably change their algorithm again in a couple months but now they are recognizing that LSI has value. Most IVPs that provide latent semantic indexing, content and text analytics, such as Content Analyst,have gone way beyond what Google’s offering with the latest LSI trends to make data more findable and discover new correlations.
Whitney Grace, June 26, 2012
Sponsored by Content Analyst
The Alleged Received Wisdom about Predictive Coding
June 19, 2012
Let’s start off with a recommendation. Snag a copy of the Wall Street Journal and read the hard copy front page story in the Marketplace section, “Computers Carry Water of Pretrial Legal Work.” In theory, you can read the story online if you don’t have Sections A-1, A-10 of the June 18, 2012, newspaper. Check out a variant of the story appears as “Why Hire a Lawyer? Computers Are Cheaper.”
Now let me offer a possibly shocking observation: The costs of litigation are not going down for certain legal matters. Neither bargain basement human attorneys nor Fancy Dan content processing systems make the legal bills smaller. Your mileage may vary, but for those snared in some legal traffic jams, costs are tough to control. In fact, search and content processing can impact costs, just not in the way some of the licensees of next generation systems expect. That is one of the mysteries of online that few can penetrate.
The main idea of the Wall Street Journal story is that “predictive coding” can do work that human lawyers do for a higher cost but sometimes with much less precision. That’s the hint about costs in my opinion. But the article is traditional journalistic gold. Coming from the Murdoch organization, what did I expect? i2 Group has been chugging along with relationship maps for case analyses of important matters since 1990. Big alert: i2 Ltd. was a client of mine. Let’s see that was more than a couple of weeks ago that basic discovery functions were available.
The write up quotes published analyses which indicate that when humans review documents, those humans get tired and do a lousy job. The article cites “experts” who from Thomson Reuters, a firm steeped in legal and digital expertise, who point out that predictive coding is going to be an even bigger business. Here’s the passage I underlined: “Greg McPolin, an executive at the legal outsourcing firm Pangea3 which is owned by Thomson Reuters Corp., says about one third of the company’s clients are considering using predictive coding in their matters.” This factoid is likely to spawn a swarm of azure chip consultants who will explain how big the market for predictive coding will be. Good news for the firms engaged in this content processing activity.
What goes faster? The costs of a legal matter or the costs of a legal matter that requires automation and trained attorneys? Why do companies embrace automation plus human attorneys? Risk certainly is a turbo charger?
The article also explains how predictive coding works, offers some cost estimates for various actions related to a document, and adds some cautionary points about predictive coding proving itself in court. In short, we have a touchstone document about this niche in search and content processing.
My thoughts about predictive coding are related to the broader trends in the use of systems and methods to figure out what is in a corpus and what a document is about.
First, the driver for most content processing is related to two quite human needs. First, the costs of coping with large volumes of information is high and going up fast. Second, the need to reduce risk. Most professionals find quips about orange jump suits, sharing a cell with Mr. Madoff, and the iconic “perp walk” downright depressing. When a legal matter surfaces, the need to know what’s in a collection of content like corporate email is high. The need for speed is driven by executive urgency. The cost factor clicks in when the chief financial officer has to figure out the costs of determining what’s in those documents. Predictive coding to the rescue. One firm used the phrase “rocket docket” to communicate speed. Other firms promise optimized statistical routines. The big idea is that automation is fast and cheaper than having lots of attorneys sifting through documents in printed or digital form. The Wall Street Journal is right. Automated content processing is going to be a big business. I just hit the two key drivers. Why dance around what is fueling this sector?