Metadata on Unstructured Data Increases Findability

November 9, 2012

Big data has held the media spotlight long enough to surpass any initial thought that it was a passing trend. Now the headlines trumpet how to benefit from the massive amounts of unstructured data flooding the internet and how to process it.

Computer Weekly’s article“How to Manage Unstructured Data for Business Benefit” explains how the next data evolution will be harnessing the benefits of both unstructured and structured data:

“There is as much value in unstructured data in terms of what customers are thinking on the web and what businesses can derive from other organizations’ data. It requires an understanding of the type of information the business is looking for and the kinds of insights business managers are hoping to draw from the data. The more considered the query, and the more focused the search, the better the results. This rule applies to both structured and unstructured data.”

Applying metadata to unstructured data opens up a profound new way to increase the findability of enterprise content, but the right solution is mandatory for success. Businesses looking for secure search and enterprise accessibility will find Intrafind provides customized solutions that combine to organize, tag and ultimately reveal relevant information to users of their enterprise search solutions. Powerful tools like this provide flexible options for data processing that put the power to increase efficiency and ROI back in the hands of the user.

Jennifer Shockley, November 9, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Big data, Enterprise search, Indexing, Metadata, News | Comments Off on Metadata on Unstructured Data Increases Findability

Effective Knowledge Management Requires Enterprise Search

October 18, 2012

In our highly-wired society where nearly everyone is connected all day, every day, it would seem that knowledge management would assist people in becoming more creative while collaborating with one another. However, a recent Taking AIIM blog post, “Take Knowledge Management with a Grain of Salt, Else You’re Better off Stoned,” tells us otherwise. The post informs us of a recent study by Princeton and Stanford University psychologists that reveals people can become addicted to research to the point where their decisions and actions are inhibited.

The post goes on to elaborate on another study with similar results:

“Not enough for you? Seven years ago, an article ran in NewScientist. It highlights a study done at King’s College London, that showed in today’s business setting, marked by emails, smart phone connections,– the connected 24×7 reality of today, the average IQ of an individual drops by about 10 points. The study went on to conclude, (and this is my favorite part), ‘Even smoking dope has less effect on your ability to concentrate on the task in hand.’”

Knowledge management is obviously powerful, but requires one to step back and consider available options and information. Enterprise search is a key ingredient to knowledge management and Intrafind offers some of best in class best practices for secure searching that offers semantic linking and intelligent tagging.

Andrea Hayden, October 18, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Enterprise search, Indexing, Metadata, News, Search enabled applications | Comments Off on Effective Knowledge Management Requires Enterprise Search

Content Targeting for Optimum Digital Customer Experiences

October 12, 2012

With so many possible outlets for engaging customers via digital resources, it may be difficult for companies to find the right mix of services for their digital initiatives. In “Eight Areas You’ll Invest in for Great Digital Customer Experiences” on the CRM Blog, we learn about steps brands are taking to deliver the best customer experiences with available digital resources.

A recent Forrester survey of Web content management (WCM) professionals shows that the focus is on mobile content delivery, video streaming, email tools, and content targeting.

The article elaborates on the importance of content targeting for authenticated users:

“WCM vendors have been pushing hard their vision and capabilities to help deliver customized and personalized content using their systems, and many are already providing strong capabilities in this area. For many marketers and content pros, however, the technical capacity of a WCM system to manage and deliver targeted content to customers, prospects, and partners is outstripping marketers’ ability to take advantage of it. This can be complex. You need a plan. You need people responsible for the execution of the plan. It’s an ongoing commitment.”

The challenge of content targeting and authentication is a key business information concern. A critical difference exists between enterprise information used to drive business decisions and Web content targeting that drives ads. A capable vendor, such as Intrafind, can help enterprises invest strategically to meet this challenge. Intrafind’s Topic Finder, for example, automatically filters and manages these kinds of information streams.

Andrea Hayden, October 12, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Applications, Enterprise search, Indexing, News | Comments Off on Content Targeting for Optimum Digital Customer Experiences

Concept Searching Enrolls University of California

September 25, 2012

We learned that the University of California has selected Concept Searching technology to process content, automatically classify content, and provide taxonomy management software to the Office of the President. “University of California, Office of the President Using Concept Searching’s Smart Content Framework™” said:

The University of California, Office of the President is the system wide headquarters of the University of California, managing its fiscal and business operations and supporting the academic and research missions across its campuses, labs and medical centers.

The Office of the President is the system wide headquarters of the University of California, managing its fiscal and business operations and supporting the academic and research missions across its campuses, labs and medical centers.

conceptClassifier for SharePoint has enabled the University of California, Office of the President to realize search improvements in SharePoint 2007 and in the recent deployment of SharePoint 2010. The university has integrated with the Term Store and taken advantage of the full support of managed metadata properties provided by conceptClassifier for SharePoint.

Martin Garland, president of Concept Searching said:

Using the first two building blocks of the Smart Content Framework™, Metadata and Insight, the University of California, Office of the President was able to rapidly deploy enterprise taxonomies and build the framework to improve search outcomes. This adoption of Concept Searching technologies continues to show our platform is an important component for any organization that places high value on content assets.

Concept Searching provides software products that deliver conceptual metadata generation, auto-classification, and powerful taxonomy management from the desktop to the enterprise. Concept Searching, developer of the Smart Content Framework™, provides organizations with a method to mitigate risk, automate processes, manage information, protect privacy, and address compliance issues. This information governance infrastructure framework utilizes a set of technologies that encompasses the entire portfolio of information assets, resulting in increased organizational performance and agility.

Concept Searching asserts that it is the only platform independent statistical metadata generation and classification software company in the world that uses concept extraction and compound term processing to significantly improve access to unstructured information. The Concept Searching Microsoft suite of technologies runs natively in SharePoint 2010, FAST, Windows Server 2008 R2 FCI, and in Microsoft Office applications.

A June 2012 white paper explaining conceptClassifier is available at this link.

Stephen E Arnold, September 25, 2012

Google Autocomplete: Is Smart Help a Hindrance?

September 10, 2012

You may have heard of the deep extraction company Attensity. There is another company in a similar business with the name inTTENSITY. Not the playful misspelling of the common word “intensity.” What happens when a person looking for the company inTTENSITY get when he or she runs a query on Google. Look at what Google’s autocomplete suggestions recommend when I type intten:

The company’s spelling appears along with the less helpful “interstate ten”, “internet explorer ten”, and “internet icon top ten.” If I enter “inten”, I don’t get the company name. No surprise.

Is Google’s autocomplete a help or hindrance? The answer, in my opinion, is it depends on the users and what he or she is seeking.

I just read “Germany’s Former First Lady Sues Google For Defamation Over Autocomplete Suggestions.” According to the write up:

When you search for “Bettina Wulff” on Google, the search engine will happily autocomplete this search with terms like “escort” and “prostitute.” That’s obviously not something you would like to be associated with your name, so the wife of former German president Christian Wulff has now, according to Germany’s Süddeutschen Zeitung, decided to sue Google for defamation. The reason why these terms appear in Google’s autocomplete is that there have been persistent rumors that Wulff worked for an escort service before she met her husband. Wulff categorically denies that this is true.

The article explains that autocomplete has been the target of criticism before. The concluding statement struck me as interesting:

In Japan, a man recently filed a suit against Google after the autocomplete feature started linking his names with a number of crimes he says he wasn’t involved in. A court in Japan then ordered Google to delete these terms from autocomplete. Google also lost a similar suit in Italy in 2011.

I have commented about the interesting situations predictive algorithms can create. I assume that Google’s numerical recipes chug along like a digital and intent-free robot.

Written by Stephen E. Arnold · Filed Under Feature, Google, Indexing, Predictive coding | Comments Off on Google Autocomplete: Is Smart Help a Hindrance?

Twitter Politics

August 31, 2012

Oh, goody, more predictive silliness. TechNewsWorld informs us, “Twindex Tracks Pols’ Twitter Temperatures.” Clever name, though it does make me think more about window cleaning than about politics. That’s ok; window cleaning is the more engaging subject.

The full name of the metric is the Twitter Political Index, and it tracks tweeters’ daily thoughts about the two presidential candidates. Twitter created the index with the help of Topsy Labs and pollsters at the Mellman Group and North Star Opinion Research. The polling firms helped validate and tune the algorithms. It is Topsy’s job to track tweets for certain terms and compare sentiment on each candidate. So far, the incumbent seems to be well ahead in the Twittersphere.

But how far can we trust the Twindex? Probably not very far. Writer Richard Adhikari observes:

“The Pew Research Center has found that only 15 percent of adults online use Twitter. On a typical day, that figure is only 8 percent. . . .

“Overall, nearly 30 percent of young adults use Twitter, up from 18 percent the previous year. One in five people aged 18 to 24 uses Twitter on a typical day.

“Further, 11 percent of adults aged 25 to 34 use Twitter on a typical day.

“African-Americans are also heavy Twitter users, with 28 percent of them using Twitter overall and 13 percent doing so on a typical day.

“Urban and suburban residents are also significantly more likely to use Twitter than those in rural areas, Pew found.”

So, yeah, statistically Democrats are likely to fare better among Twitter users than Republicans. This index is about as valuable as any political echo chamber—for entertainment only. Personally, I’d rather be washing windows.

Cynthia Murrell, August 31, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Indexing, News, Twitter | Comments Off on Twitter Politics

Document Management Is Ripe For eDiscovery

July 18, 2012

If you work in any aspect related to the legal community, you should be aware that eDiscovery generates a great deal of chatter. Like most search and information retrieval functions, progress is erratic.

While eDiscovery, according to the marketers who flock to Legal Tech and other conferences, will save clients and attorneys millions of dollars in the long run, there will still be some associated costs with it. Fees do not magically disappear and eDiscovery will have its own costs that can accrue, even if they may be a tad lower than the regular attorney’s time sheets.

One way to keep costs down is to create a document management policy, so if you are ever taken to court it will reduce the amount of time and money spent in the litigation process. We have mixed feelings about document management. The systems are often problematic because the management guidance and support are inadequate. Software cannot “fix” this type of issue. Marketers, however, suggest software may be up to the task.

JD Supra discusses the importance of a document management plan in “eDiscovery and Document Management.” The legal firm of Warner, Norcross, and Judd wrote a basic strategy guide for JD Supra for people to get started on a document management plan. A plan’s importance is immeasurable:

“With proper document management, you’ll have control over your systems and records when a litigation hold is issued and the eDiscovery process begins, resulting in reduced risk and lower eDiscovery costs. This is imperative because discovery involving electronically stored data — including e-mail, voicemail, calendars, text messages and metadata — is among the most time-consuming and costly phases of any dispute. Ultimately, an effective document management policy is likely to contribute to the best possible outcome of litigation or an investigation.”

The best way to start working on a plan is to outline your purpose and scope—know what you need and want the plan to do. Also specify who will be responsible for each part of the plan—not designating proper authority can leave the entire plan in limbo. Never forget a records retention policy—it is legally require to keep most data for seven years or permanently, but some data can be deleted. Do not pay for data you do not have to keep. Most important of all, provide specific direction for individual tasks, such as scanning, word management, destruction schedule, and observing litigation holds. One last thing, never under estimate the importance of employee training and audit schedules, the latter will sneak up on you before you know it.

If, however, you still are hesitant in drafting a plan can carry some hefty consequences:

“Outdated and possibly harmful documents might be available and subject to discovery.
Failure to produce documents in a timely fashion might result in fines and jail time: one large corporation was charged with misleading regulators and not producing evidence in a timely matter and was fined $10 million.
Destroying documents in violation of federal statutes and regulations may result in fines and jail time: one provision of the Sarbanes-Oxley Act specifies a prison sentence of up to 20 years for someone who knowingly destroys documents with the intent to obstruct a government investigation.”

A document management plan is a tool meant to guide organizations in managing their data, outlining the tasks associated with it, and preparing for eventual audits and litigation procedures. Having a document management plan in place will make the eDiscovery process go quicker, but another way to make the process even faster and more accurate is using litigation support technology and predictive coding, such as provided by Polyspot.

Here at Beyond Search we have a healthy skepticism for automated content processing. Some systems perform quite well in quite specific circumstances. Examples include Digital Reasoning and Ikanow. Other systems are disappointing. Very disappointing. Who are the disappointing vendors? Not in this free blog. Sign up for Honk!, our no holds barred newsletter, and get that opt-in, limited distribution information today.

Whitney Grace, July 18, 2012

Trimming Legal Costs and Jobs: A Predictive Coding Unintended Consequence?

July 17, 2012

Predictive coding and eDiscovery are circling the legal communities gossip rings about what it means for the future of legal costs and jobs. The Huffington Post addresses the topic in “ ‘Lawyerbots’ Offer Attorneys Faster, Cheaper Assistants.” The US court system has made new regulations when it comes to eDiscovery technology and how it can be used in court cases. Lawyer, legal professionals, and even the companies licensing various programmatic content processing systems are struggling to understand the upside and downside of the algorithmic approach to coding. One-way eDiscovery and predictive coding will be used is to cut down on the many, many hours of post-processing some electronic documents. This new technology is being referred to as “lawyerbots.”

Lawyerbots cut through the man-hours like an electric knife, saving time and clients money. Many are optimistic about the changes. But some clients are ambivalent:

“But how will clients feel about a computer doing some of the dirty work, instead of a lawyer or paralegal manually digging through documents? Some could be concerned that a computer is more apt to make an error, or overlook crucial information. In a recent study in the Richmond Journal of Law and Technology, lawyer labor was tested against lawyerbots with predictive coding software. Researchers found “evidence that such technology-assisted processes, while indeed more efficient, can also yield results superior to those of exhaustive manual review.” In basic terms, the computers had the humans licked.”

Faster and more accurate! It is an awesome combination, but the next question to follow is what about jobs? There are several predictions already out there; the article mentions how Mike Lynch of Autonomy believes the legal community will employ fewer people in the future. Others are embracing the new technology pattern and plan to see changes as the older lawyers retire. Here’s one observation:

“Jonathan Askin, the director of Brooklyn Law School’s Brooklyn Law Incubator and Policy Clinic (BLIP)…said, ‘When I look around at my peers, I see 40-year-old lawyers who are still communicating via snail mail and fax machines and telephones and appearing in physical space for negotiations.’ He said he hopes to better merge the legal sector and technology to serve both lawyers and their clients more efficiently.”

We arrive at yet another crossroads: traditional, variable cost ways vs. new, allegedly more easily budgeted approach to content analysis.

As a librarian, I predict, without having to use predictive analytics that eDiscovery will take some legal occupations. Online wrecked havoc in the special library market. However, I am confident that there will still be a need for humans to keep the lawyerbots and maybe the marketers of these systems in check.

After all, software technology is only as smart as humans program it and humans are prone to error. The lawyerbots will also drive down costs, a blessing in this poor economy, and more people will be apt to bring cases to court, increasing demand for lawyers. In order to get to this point, however, there needs to be an established set of standards on how litigation support software can be programmed, how it can be used, and basic requirements for the processes/code. What’s the outlook? Uncertainty and probably one step forward and one step backwards.

Whitney Grace, July 17, 2012

Google and Latent Semantic Indexing: The KnowledgeGraph Play

June 26, 2012

One thing that is always constant is Google changing itself. Not too long ago Google introduced yet another new tool: Knowledge Graph. Business2Community spoke highly about how this new application proves the concept of latent semantic indexing in “Keyword Density is Dead…Enter “Thing Density.” Google’s claim to fame is providing the most relevant search results based on a user’s keywords. Every time they update their algorithm it is to keep relevancy up. The new Knowledge Graph allows users to break down their search by clustering related Web sites and finding what LSI exists between the results. From there the search conducts a secondary search and so on. Google does this to reflect the natural use of human language, i.e. making their products user friendly.

But this change begs an important question:

“What does it mean for me!? Well first and foremost keyword density is dead, I like to consider the new term to be “Concept Density” or to coin Google’s title to this new development “Thing Density.” Which thankfully my High School English teachers would be happy about. They always told us to not use the same term over and over again but to switch it up throughout our papers. Which is a natural and proper style of writing, and we now know this is how Google is approaching it as well.”

The change will means good content and SEO will be rewarded. This does not change the fact, of course, that Google will probably change their algorithm again in a couple months but now they are recognizing that LSI has value. Most IVPs that provide latent semantic indexing, content and text analytics, such as Content Analyst,have gone way beyond what Google’s offering with the latest LSI trends to make data more findable and discover new correlations.

Whitney Grace, June 26, 2012

Beyond Search

Metadata on Unstructured Data Increases Findability

Effective Knowledge Management Requires Enterprise Search

Content Targeting for Optimum Digital Customer Experiences

Concept Searching Enrolls University of California

Google Autocomplete: Is Smart Help a Hindrance?

More Content Processing Brand Confusion

Twitter Politics

Document Management Is Ripe For eDiscovery

Trimming Legal Costs and Jobs: A Predictive Coding Unintended Consequence?

Google and Latent Semantic Indexing: The KnowledgeGraph Play

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta