Protected: Manage SharePoint Metadata in Excel
September 30, 2011
Flock of Articles Provides Search Summary
September 18, 2011
Chris Dale’s rundown of the latest tech news is quite useful. “A Flock of Articles on Computer-Assisted Document Review,” will quickly get you up to date on recent developments. Half a dozen articles are highlighted, with commentary provided for each.
Here’s Dale’s commentary:
Look next at an article in Legal Technology News by Farrah Pepper, of counsel at Gibson, Dunn and Crutcher called Robot Review: Will predictive coding win the trust of the courts? Like Judge Peck in the speech reported in my article, Farrah Pepper reviews some of the cases and learned papers.
The issue comes to the surface as Recommind applied for a patent for its predictive coding technology. He asserts:
Predictive coding software providers claim they can automate much of the document review process, with human guidance. Documents can be prioritized into likely order of importance, typically based on a “learning set” of documents coded up front by a subject matter expert, they explain. That essentially creates a rebuttable presumption of relevance for other coding that can be tested via sampling and revised if necessary. Then, the argument goes, the attorneys leading the case can dig into the substance a whole lot faster.
Keep an eye on the matter as it continues to develop. The courts will have to decide on the issue one way or another, as technology will continue to push.
Emily Rae Aldridge, September 18, 2011
Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search
Hlava on Machine Assisted Indexing
September 8, 2011
On September 7, 2011, I interviewed Margie Hlava, president and co-founder of Access Innovations. Access Innovations has been delivering professional taxonomy, indexing, and consulting services to organizations worldwide for more than 30 years. In our first interview, Ms. Hlava discussed the needs for standards and the costs associated with flawed controlled term lists and some loosely-formed indexing methods.
In this podcast, I spoke with her about her MAI or machine assisted indexing technology. The idea is that automated systems can tag in a consistent manner high volume flows of data. The “big data” challenge often creates significant performance problems for some content processing systems. MAI balances high speed processing with the ability to accommodate the inevitable “language drift” that is a natural part of human content generation.
In this interview, Ms. Hlava discusses:
- The value of a neutral format so that content and tags can be easily repurposed
- The importance of metadata enrichment which allows an indexing process to capture the nuances of meaning as well as the tagging required to allow a user to “zoom” to a septic location in a document, pinpoint the entities in a document, and automated summarization of documents
- The role of an inverted index versus the tagging of records with a controlled vocabulary.
One of the key points is that flawed indexing contributes to user dissatisfaction with some search and retrieval systems. She said, “Search is like standing in line for a cold drink on a hot day. No matter how good the drink, there will be some dissatisfaction with the wait, the length of the line, and the process itself.”
You can listen to the second podcast, recorded on August 31, 2011, by pointing your browser to http://arnoldit.com/podcasts/. You can get additional information about Access Innovations at For more information about Access Innovations at this link. The company publishes Taxodiary, a highly regarded Web log about indexing and taxonomy related topics.
Stephen E Arnold, September 8, 2011
Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search
IxReveal Closes Deal with CFIX
September 4, 2011
CFIX is the Central Florida Information Exchange, a regional fusion center for certain government entities and professionals. IxReveal describes itself as “an innovative analytic software company focused on giving end uses the ability to fuse and extract knowledge and insight from large amounts of electronic data.”
What makes the company interesting is that the firm’s technology can harmonize electronic data in almost any form. With data transformation and normalization costs skyrocketing, solutions which can help minimize the expense of converting data from one format to another are of increasing importance.
IxReveal positions its software as a “search and analysis” product. The firm’s system identifies concepts automatically. Furthermore, the system provides automatic analytics which allow a user to sidestep the “you don’t know what you don’t know” issue. IxReveal discerns trends, patterns, anomalies, and relationships in the electronic information processed. In addition, the system provides tools for fusing, managing, and analyzing processed information.
For more information about the company, point your browser toward www.ixreveal.com.
Stephen E Arnold, September 4, 2011
Sponsored by Pandia.com
Hlava on Indexing, Metadata, and Findability
September 1, 2011
On August 31, 2011, I spoke with Margie Hlava, president and co-founder of Access Innovations. The idea for a podcast grew out of our lunch chatter. I then brought her back to the ArnoldIT office and we recorded a conversation about the challenges of “after the fact” indexing. One of the key points surfacing in the interview is the importance of a specific work process required for developing an indexing approach. “Fire, ready, aim!” is a method which can undermine an otherwise effective search solution. In the podcast, Ms. Hlava makes three points:
- Today’s search systems are often making it difficult for users to locate exactly the information needed. Access Innovations’ software and services can change “search to found.”
- Support for standards is important. Once a controlled term list or other value adding indexing process has been implemented, Access Innovations makes it easy for clients to repurpose and move their metadata. Ms. Hlava said, “We are standards wonks.”
- Indexing and metadata are challenging tasks. On the surface, creating a word list looks easy. Errors in logic make locating information more difficult. Informed support and the right taxonomy management system is important. The Access Innovations’ solutions are available as cloud services or as on premises installations.
The challenge is that automated content processing without controlled term lists creates a wide range of problems for users.
You can listen to the podcast by navigating to http://arnoldit.com/podcasts/. For more information about Access Innovations, point your browser to www.accessinn.com. Be sure to take a look at Access Innovations’ Web log, Taxodiary. Updated each day, the blog is at www.taxodiary.com
Stephen E Arnold, September 1, 2011
Sponsored by Pandia.com
The Governance Air Craft Carrier: Too Big to Sail?
August 31, 2011
In a few days, I disappear into the wilds of a far off land. In theory, a government will pay me, but I am increasingly doubtful of promises made from 3,000 miles from Harrod’s Creek. As part of the run up to my departure, we held a mini webinar/consultation on Tuesday, August 30, 2011, with a particularly energetic company engaged in “governance.” (SharePoint Semantics has dozens of articles about governance. One example is “A Useful Guide to SharePoint Success from Symon Garfield”. The format of the call was basic. The people on the call asked me questions, and I provided only the perspective of three score years and as many online failures can provide. (I will mention SharePoint but my observations apply to other systems as well; for instance, Documentum, Interwoven, FileNet, etc.)
What I want to do in this short write up is identify a subject that we did not tackle directly in that call, which concerned a government project. However, after the call, I realized that what I call an “air craft carrier” problem was germane to the discussion of automated indexing and entity extraction. An air craft carrier today is a modular construction. The idea is that the flight deck is made by one or more vendors, moved to the assembly point, and bolted down. The same approach is taken with cabins, electronics, and weapon systems.
The basic naval engineering best practice is to figure out how to get the design nailed down. Who wants to have propeller assemblies arrive that do not match the hull clearance specification?
What’s an air craft carrier problem? An air craft carrier is a big ship. It is, according to my colleague Rick Fiust, a former naval officer, a “really big ship.” Unlike a rich person’s yacht or a cruise ship, an air craft carrier does more than surprise with its size. Air craft carriers pack a wallop. In grade school I remember learning the phrase “gun boat diplomacy.” The idea was that a couple of gun boats sends a powerful message.
What every content centric system aspires to be. Some information technology professionals will tell their bosses or clients, “You have a state of the art search and content processing system. Everything works.” Unlikely in my experience.
Governance or what I like to think of as “editorial policy” is an air craft carrier. The connotation of governance is broad, involves many different functions, and sends a powerful message. The problem is that when content in an organization becomes unmanageable, the air craft carrier runs aground and the crew is not exactly sure what to to about the problem.
Consider this real life example. A well meaning information technology manager installs SharePoint to allow the professionals in marketing to share their documents, price lists, and snippets from a Web site. Then the company acquires another firm, which runs SharePoint as well as a handful of enterprise applications. On the surface, the situation looks straight forward. However, the task of getting the two organizations’ systems to work smoothly is a bit tricky. There are the standard challenges of permissions and access as well as somewhat more exotic ones of coping with intra-unit indexing and index refreshes. Then a third company is acquired, and it runs SharePoint. Unlike the first two installations which were “by the book”, the third company’s information technology unit used SharePoint as a blank canvas and created specialized features and services, plugged in third party components, and some home grown code.
Now the content issue arises. What content is available, when, to whom, and under what circumstances. Because the SharePoint installation was built in separate modules over time, will these fit together? Nope. There was no equivalent of the naval engineering best practice.
Governance, in my opinion, is the buzz word slapped on content centric systems of which SharePoint is but one example. The same governance problem surfaces when multiple content centric systems are joined.
Will after the fact governance solve the content problems in a SharePoint or other content centric environment? In my experience, the answer is, “Unlikely.” There are four reasons:
Cost. Reworking three systems built on the same platform should be trivial. The work is difficult and in some situations, scrapping the original three systems and starting over may be a more cost effective solution. Who knows what interdependencies lurk within the three systems which are supposed to work as one? Open ended engineering projects are likely to encounter funding problems, and the systems must be used “as is” or fixed a problem at a time.
MetaCarta Offers Geotagging Plug-In
August 19, 2011
Geospatial context is the linch pin for cultural and human ecosystem modeling and analysis. Concept templates can guide models, allowing professionals to consider economic, religious, political and geographic features simultaneously. “Geotagging with MetaCarta” explains the Thetus blog, is a new plug-in solution for creating such models.
MetaCarta’s GeoSearch Toolkit plug-in for Apache Solr, an open source high performance search and index, gives us the ability to combine geographic search constraints such as bounding boxes and heat maps with the many other semantic and text-based search inputs that we have built up using Solr. This toolkit from MetaCarta allows us to run geo-aware searches through one unified and high performance search engine, rather than needing to conglomerate geographic search results from one data source with semantic search results from another source.
The GeoSearch plug-in by MetaCarta makes a lot of sense for professionals seeking ease and speed when incorporating geographic data into their work. Geography is certainly a specific field, and those not well versed in its intricacies often choose to stay away all together. Perhaps software such as this offering by MetaCarta can make geography a user-friendly affair. Thetus keeps a low profile, but the company continues to move forward with commercial and government work.
Emily Rae Aldridge, August 19, 2011
Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search
Access Innovations Expands, Supports Medical Coding
August 17, 2011
We learned from one of our readers that Access Innovations that the company has expanded into an exciting new area—medical coding and analysis. In our opinion, the company is one of the leaders in taxonomy and controlled-term related systems and services delivering solutions that reduce errors and costs.
According to our reader:
Access Innovations, Inc., a leader in data integrity and content creation, has announced the Access Innovations Integrity Initiative (AI³), a suite of tools and services for quality assurance and validation of medical coding. Access Innovations Integrity Initiative is not just for physicians, hospitals, and their data service providers. It also includes tools that give auditors and insurers the information management tools they need to quickly identify areas of noncompliance or suspicious activity.
Margie Hlava, whom we interviewed a few weeks ago, told us:
We are dedicated to productivity and cost savings as a company. This new application of our long-standing tool set enables a radical departure from other less consistent and accurate tools. These are the tools used in scholarly publishing and other information activities for many years. Applying ANSI standard-compliant Data Harmony tools to the health arena, coupled with our support of automated coding accuracy, means cost savings as well as increased precision.
Why the expansion at a time when dozens of search and content processing companies are struggling to find shelter in the financial hail storms which buffet many vendors? According to Ms. Hlava,
Coding mistakes or improper coding adds to the cost of health care through out the service chain. AI³ can lower those administrative costs. An initial consultation leads to the development of an automated audit-trigger analysis, identifying inefficiencies and inaccuracies based on records, notes, or other supplied data. A rules-based approach allows for the analysis of dynamic data sets, unlike a purely statistical approach, which quickly becomes suboptimal as more data is entered. The system can be used to quickly and accurately validate medical coding or to locate errors in existing documentation. Our technology delivers cost savings without compromise.
For more information about Access Innovations’ services, navigate to Access Innovations Integrity Initiative. For more information about the firm’s landmark technology, navigate to this product catalog.
How do I know the company’s approach works? We used this system when I was working at the commercial database company producing ABI/INFORM, Business Dateline, and other high value, profitable databases.
Stephen E Arnold, August 1, 2011
Sponsored by Pandia.com
Thoughts from an Industry Leader: Margie Hlava, Access Innovations
August 4, 2011
Here are some astute observations on the direction of enterprise search from someone who knows what she’s talking about. Library Technology Guides points to an interview with Margie Hlava, president of Access Innovations, in “Access Innovations founder and industry pioneer talks about trends in taxonomy and search.”
Ms Hlava’s 33 years in the search industry informed her observations on current trends, three of which she sees as significant: Cloud and Software as a Service (SaaS) computing, term mining, and the demand for metadata.
The move to the Cloud and SaaS computing demands more of our hardware, not less, Hlava insists. In particular, broadband networks are struggling to keep up. One advantage of the shift is a declining need to navigate labyrinths of hardware, software, and even internal politics on the client side. Other pluses are the motion toward increased data sharing and service enhancement. Also, more ways to maintain security and intellectual property rights are on the horizon.
She says that term mining is “a process involving conceptual extraction using thesaurus terms and their synonyms with a rule-base, then looking for occurrences to create more detailed data maps,” according to Hlava. Her company leverages this concept to make the most of clients’ large data sets. She is interested in new angles like mashups, data fusion, visualization, linked data, and personalization, but with a caveat: success in all these depends on the quality of the data itself. “Rotten data gives rotten results.”
Ms. Hlava regards taxonomies and other metadata enrichment as the way to bring efficiency to our searches. In that realm, the benefits have only begun:
“In terms of taxonomies and search, ‘I think we have just scratched the surface. With good data, our clients are in a good position to do an incredible array of new and interesting things. Good taxonomies take everything to the next level, forming the basis of not only mashups, but also author networks, project collaborations, deeper and better information retrieval,’ she concluded.”
Wise words from a wise woman. We look forward to observing these predictions take shape as the search industry moves forward. The interview with Margie Hlava, can be read in full here.
Access Innovations offers a wide range of content management services. The company has been building its semantic-based solutions for over thirty years and prides itself on its unique tool set and experienced personnel.
Stephen E Arnold, August 4, 2011
Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search
Indexing a Good Start, not a Solution
July 28, 2011
We love it when 20 somethings discover the wheel, fire, or song. Almost as exciting is the breakthrough that allows today’s digital “experts” to see value in indexing. Yes.
InfoWorld asks, “Can Metadata Save Us from Cloud Data Overload?”
The simple answer: not by itself.
Writer David Linthicum acknowledges that the rapid and redundant proliferation of data demands action beyond moving it all to the Cloud. Many think that metadata is the solution. Properly tagging your data is necessary, but the big picture is more complicated than that. He states:
The management of data needs to be in the context of an overreaching data management strategy. That means actually considering the reengineering of existing systems, as well as understanding the common data elements among the systems. Doing so requires much more than just leveraging metadata; it calls for understanding the information within the portfolio of applications, cloud or not. It eventually leads to the real fix. The problem with this approach is that it’s a scary concept to consider.
Well, sort of. But indexing is not horse shoes. Indeed, and people and businesses often, though unwisely, spin their wheels looking for easy solutions rather than confront an overwhelming reality. The truth, though, is that indexing is not a silver bullet. There are issues related to editorial policy, use of a controlled term list, and quality control.
The sooner companies face this fact and get into the nuts and bolts of their data operations, the sooner they will reap the rewards of efficiency. In the meantime, we await the next big thing. We have heard it is drawing on cave walls. There’s an automated image indexing system ready to tag those graphic outputs too.
Stephen E Arnold, July 28, 2011
Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion