Vivisimo Now at Version 8 of Velocity

October 11, 2010

The news release fooled me. The title was “Vivisimo Releases Velocity 8.0, New Version of Its Market Leading Information Optimization Platform.” (News release links can go dark. You may have to poke around for the source document.) I continue to think of Vivisimo as a company with an on-the-fly clustering function that makes results from metasearch results useful. No more. Velocity 8,0 is an “information optimization platform”, a phrase that means about as much to me as “taxonomy governance,” about which I commented on Linked In last week.

Terminology aside, the new release of Velocity includes:

Hit boosting. The idea is that a certain piece of information can be placed at the top of a results list
Support for Microsoft SharePoint
Tweaks to scaling
A query auto complete function.

There are other enhancements. You can find these described at www.vivisimo.com. Hmmm. “Information optimization platform.” Another platform, another interesting way to describe information retrieval. Whatever works is okay with me.

Stephen E Arnold, October 11, 2010

Freebie

Written by Stephen E. Arnold · Filed Under Enterprise, News, Search | Comments Off on Vivisimo Now at Version 8 of Velocity

ZL Systems and TREC

August 13, 2010

I don’t write anything about TREC, the text retrieval conference “managed” by NIST (US Department of Commerce’s National Institute of Standards and Technology). The participants in the “tracks”, as I understand the rules, may not use the data for Madison Avenue-style cartwheels and reality distortion exercises.

The TREC work is focused on what I characterize as “interesting academic exercises.” Over the years, the commercial marketplace has moved in directions that are different from the activities for the TREC “tracks”. A TREC exercise is time consuming and expensive. The results are difficult for tire kickers to figure out. In the last three years, the commercial market is moving in a manner different from academic analyses. You may recall my mentioning that Autonomy had 20,000 customers and that Microsoft SharePoint has tens of millions of licensees. Each license contains search technology and cultivates a fiercely competitive ecosystem to “improve” findability in SharePoint. Google is chugging along without much worry about what’s happening outside of the Googleplex unless it involves Apple, money, and lawyers. In short, research is one thing. Commercial success is quite another.

I was, therefore, interested to see “Study Finds that E-Discovery Using Enterprise-Wide Search Improves Results and Reduces Costs.” The information about this study appeared in the ZL Technologies’ blog The Modern Archivist in June 2010. You can read the story “New Scientific Paper for TREC Conference”, which was online this morning (August 10, 2010). In general information about TREC is hard to find. Folks who post links to TREC presentations often find that the referenced document is a very short item or no longer available. However, you can download the full “scientific paper” from the TREC Web site.

The point of the ZL write up is summarized in this passage:

Using two fully-independent teams, ZL tested the increased responsiveness of the enterprise-wide approach and the results were striking: The enterprise-wide search yielded 77 custodians and 302 responsive email messages, while the custodian approach failed to identify 84% of the responsive documents.

The goose translates this to mean that there’s no shortcut when hunting for information. No big surprise to the goose, but probably a downer to those who like attention deficit disorder search systems.

So what’s a ZL Technologies? The company says:

[It] provides cutting-edge enterprise software solutions for e-mail and files archiving for regulatory compliance, litigation support, corporate governance, and storage management. ZL’s Unified Archive, offers a single unified platform to provide all the above capabilities, while maintaining a single copy and a unified policy across the enterprise. With a proven track record and enterprise clients which include top global institutions in finance and industry, ZL has emerged as the specialized provider of large-scale email archiving for eDiscovery and compliance.

Some information about TREC 2010 appears in “TREC 2010 Web Track Guidelines”. The intent is to describe one “track”, but the information provides some broader information about what’s going on for 2010. The “official” home page for TREC may be useful to some Beyond Search readers.

For more TREC information, you will have to attend the conference or contact TREC directly. The goose is now about to get his feathers ruffled about the availability of presentations that point out that search and retrieval has a long journey ahead.

Reality is often different from what the marketers present in my opinion.

Stephen E Arnold, August 12, 2010

Freebie

Written by Stephen E. Arnold · Filed Under Conferences, EDiscovery, Government, News, Real time search, Search, Text processing | Comments Off on ZL Systems and TREC

Vamosa and SchemaLogic

March 24, 2010

A happy quack to the reader who took me to task for not covering SchemaLogic more diligently. I check out my Overflight service and I can tell quickly if a search and content processing vendor is making some marketing tracks. Autonomy is on the ball; many of the vendors I track are either lacking in marketing savvy, marketing resources, or marketing energy.

I want to point to SchemaLogic’s tie up with Vamosa. SchemaLogic makes a controlled vocabulary server. The company has other technical capabilities, but I want to highlight the server product. With it, an organization can tame the wild ponies of uncontrolled tagging. SharePoint offers this users-can-do-it approach, and I think that uncontrolled tagging creates some interesting retrieval challenges. SchemaLogic’s server is a traffic cop, authority file, and repository. The software enforces some order on indexing or metatagging as the 20-somethings prefer.

Vamosa is a services firm and it is one of the many companies that offer consulting and information governance expertise to organizations. The idea is that in a SharePoint environment, people learn pretty quickly that there are problems “finding” information. Vamosa to the rescue.

The tie up allows Vamosa to offer a solution and SchemaLogic to get some marketing support. You can get details about the deal in the write up “Vamosa Adds More Content Governance Capabilities via MetaPoint.”

For information about Vamosa navigate to the firm’s Web site, www.vamosa.com. For information about SchemaLogic, you can find information at www.schemalogic.com.

Stephen E Arnold, March 24, 2010

Written by Stephen E. Arnold · Filed Under Business strategy, Enterprise, News, Text processing | 1 Comment

Microsoft Document and Records Management

February 18, 2010

I received an email from a chipper PR type pitching me on “information governance.” After a bit of “who are you” and “why are you calling me”, I realized that “information governance is marketing talk for document management. I am fighting a losing battle as I age. I know that there are different approaches to document management. If I want to keep track of documents, I use a document management system. If I have to keep track of documents for a nuclear power plant, I use a records management system. There’s a lot of talk about “information governance,” but I don’t have a clear sense of what is under that umbrella term

To confuse me even more, I happened across a document called “Application Lifecycle Management.” The idea is that SharePoint applications have a sunrise, noon, and sunset. I will not talk about squalls, earthquakes, and landslides in this environmental metaphor for SharePoint applications. You can find this information on the MSDN.

I wanted to know how Microsoft Fast search fit into this lifecycle management. I didn’t find much information, but I did locate two documents. One was titled “Introducing Document Management in SharePoint 2010” and the other was “Introducing Records Management in SharePoint 2010”. Both flowed from the keyboards of the Microsoft Enterprise Content Management Team Blog.

Okay, now I was going to learn how Microsoft perceived Document and Records Management.

Document Management

What about document management? Since the fine management performance at Enron and Tyco, among other companies, document management has become more important. The rules are not yet at the nuclear power plant repair level of stringency, but companies have to keep track of documents. The write up affirms that SharePoint used to be a bit recalcitrant when managing documents. Here’s the passage I noted:

As we looked at how our customers were starting to use the 2007 system’s DM features, we noticed an interesting trend: These features were not just part of managed document repository deployments. Indeed, the traditional DM features were getting heavy usage in average collaborative team sites as well. Customers were using them to apply policy and structure as well as gather insights from what otherwise would have been fairly unmanaged places. SharePoint was being using to pull more and more typically unstructured silos into the ECM world.

Those pesky customers! The Document Management write up runs down features in the new product. These include more metadata functions, including metadata a a “primary navigation tool.” Here’s a screen shot. Notice that there is no search box.

So much for finding information when the metadata may not be what the user anticipated. Obviously a document management system stores documents, transformations of documents like the old iPhrase, or pointers to documents or components of documents that reside “out there” on the network. The write up shifts gears to the notion of “an enterprise wiki and a traditional enterprise document repository.”

Records Management

The Records Management write up did not tackle the nuclear power plant type of records management. The write up presented some dot points about records management; for example, retention and reports. Ah, reports. Quite useful when a cooling pipe springs a leak. One needs to know who did what when, with what materials, what did the problem look like before the repair, what did it look like after the repair, which manufacturer provided the specific material, etc.

The point of the write up struck me as “the power of metadata” or indexing. Now the hitch in the git along is that multiple information objects have to be tagged in a consistent manner. After all, when the pipe springs a leak, the lucky repair crew, dosimeters on their coveralls, need to read and see the information objects related to the problem. Yep, that means engineering drawings, photos, and sometimes lab tests, purchase orders, and handwritten notes inserted in the file.

My conclusion is that Microsoft content management, regardless of “flavor”, may be similar to Coca-Cola’s New Coke. I am not sure it will do what the company and the user expect.

Stepping Back

I know that thousands, possible millions of customers will use SharePoint for document and records management. I want to point out that using SharePoint to manage a Web site can be a tough job. My view is that until I see one of these systems up and running in client organization, I am skeptical that SharePoint has the moxie to deliver either of these functions in a stable, affordable, scalable solution.

Even more interesting will be my testing search and retrieval in both of these systems. With zero reference to search and a great dependence on the semi magic word taxonomy, I think some users won’t have a clue where a particular document is and will have to hunt, which is time consuming and frustrating for some. In my experience, lawyers billing clients really thrive on hunting. Everyday business professionals may not be into this sport.

From a practical point of view, two posts, each built on a single platform with feature differences confused me. Is not a single write up with one table with three columns another way to explain these two versions of SharePoint.

In short, more confusion exists within the mind of the addled goose. The content management “experts” have created some pretty spectacular situations in organizations with SharePoint. Now it is off to the Sarbanes Oxley and Department of Energy school of “information governance.” Will SharePoint get an A or an F? Will SharePoint shaped to the rigors of document management and records management face a high noon or a Norwegian winter’s sunset?

Stephen E Arnold, February 18, 2010

No one paid me to write this. Since I mentioned nuclear energy, I will report my doing work for nothing to the DOE. I prefer the building next to White Flint Mall, which is now a white elephant in some ways.

Written by Stephen E. Arnold · Filed Under Business strategy, Enterprise, Microsoft, News, Search, SharePoint, Technology, Text processing | 4 Comments

SchemaLogic and Its MetaPoint

November 3, 2009

At the SharePoint conference, SchemaLogic announced its MetaPoint software. According to the company:

Multipoint integrates with Microsoft Office to tag and classify documents automatically when they are created and suggests to the user where documents should be stored on the Microsoft Office SharePoint Server. Multipoint helps employees find and share information more effectively while improving corporate information governance and regulatory compliance.

I have been notified about a number of products and systems that add functionality to SharePoint. I am having some fun trying to figure out which features and functions are the distinguishing ones. SchemaLogic offers its metadata repository to organizations eager to bring consistency to metadata across different enterprise software systems. The Metapoint service offers a similar functionality for SharePoint 10.

I am not able to endorse any particular SharePoint 10 metadata management system at this time. This is on our to do list. In the meantime, procurement teams will have the opportunity to install, test, and evaluate these systems. Exciting and time consuming. SharePoint 10 itself is a ton of fun and integrating a third party metadata system will be an outstanding learning experience in my opinion.

Stephen Arnold, November 3, 2009

Notice to the Department of Commerce: no one paid me to write this article explaining its fun factor.

Written by Stephen E. Arnold · Filed Under Enterprise, News, SharePoint, Text processing | Comments Off on SchemaLogic and Its MetaPoint

MarkLogic: The Shift Beyond Search

June 5, 2009

Editor’s note: I gave a talk at a recent user group meeting. My actual remarks were extemporaneous, but I did prepare a narrative from which I derived my speech. I am reproducing my notes so I don’t lose track of the examples. I did not mention specific company names. The Successful Enterprise Search Management (SESM) reference is to the new study Martin White and I wrote for Galatea, a publishing company in the UK. MarkLogic paid me to show up and deliver a talk, and the addled goose wishes other companies would turn to Harrod’s Creek for similar enlightenment. MarkLogic is an interesting company because it goes “beyond search”. The firm addresses the thorny problem of information architecture. Once that issue is confronted, search, reports, repurposing, and other information transformations becomes much more useful to users. If you have comments or corrections to my opinions, use the comments feature for this Web log. The talk was given in early May 2009, and the Tyra Banks’s example is now a bit stale. Keep in mind this is my working draft, not my final talk.

Introduction

Thank you for inviting me to be at this conference. My topic is “Multi-Dimensional Content: Enabling Opportunities and Revenue.” A shorter title would be repurposing content to save and make money from information. That’s an important topic today. I want to make a reference to real time information, present two brief cases I researched, offer some observations, and then take questions.

Let me begin with a summary of an event that took place in Manhattan less than a month ago.

Real Time Information

America’s Top Model wanted to add some zest to their popular television reality program. The idea was to hold an audition for short models, not the lanky male and female prototypes with whom we are familiar.

The short models gathered in front of a hotel on Central Park South. In a matter of minutes, the crowd began to grow. A police cruiser stopped and the two officers were watching a full fledged mêlée in progress. Complete with swinging shoulder bags, spike heels, and hair spray. Every combatant was 5 feet six inches taller or below.

The officers called for the SWAT team but the police were caught by surprise.

I learned in the course of the nine months research for the new study written by Martin White (a UK based information governance expert) and myself that a number of police and intelligence groups have embraced one of MarkLogic’s systems to prevent this type of surprise.

Real-time information flows from Twitter, Facebook, and other services are, at their core, publishing methods. The messages may be brief, less than 140 characters or about 12 to 14 words, but they pack a wallop.

MarkLogic’s slicing and dicing capabilities open new revenue opportunities.

Here’s a screenshot of the product about which we heard quite positive comments. This is MarkMail, and it makes it possible to take content from real-time systems such as mail and messaging, process them, and use that information to create opportunities.

Intelligence professionals use the slicing and dicing capabilities to generate intelligence that can save lives and reduce to some extent the type of reactive situation in which the NYPD found itself with the short models disturbance.

Financial services and consulting firms can use MarkMail to produce high value knowledge products for their clients. Publishing companies may have similar opportunities to produce high grade materials from high volume, low quality source material.

Written by Stephen E. Arnold · Filed Under Business strategy, Feature, Publishing, Technology, Text analytics, Text processing | 3 Comments

Exclusive Interview: Donna Spencer, Enterprise Systems Expert

April 20, 2009

Editor’s Note: Another speaker for what looks like a stellar conference agreed to an interview with Janus Boye. As you know, the Boye 09 Conference in Philadelphia takes place the first week in May 2009, May 5 to May 7, 2009, to be precise. Attendees can choose from a number of special interest tracks. These include a range of topics; including strategy and governance, Intranet, Web content management, SharePoint, user experience, and eHealth. Click here for more conference information. Janus Boye spoke with Donna Spencer on April 16, 2009.

Ms. Spencer is a freelance information architect, interaction designer and writer. She plans how to present the things you see on your computer screen, so that they’re easy to understand, engaging and compelling: Things like the navigation, forms, categories and words on intranets, websites, web applications and business systems.

The full text of the interview appears below.

Why is it so hard for organizations to get a grip on user experience design?

I don’t know that this is necessarily true. There are lots of organizations creating awesome user experiences. Of course, there are a lot who aren’t creating great experiences, but it isn’t because they can’t get a grip on user experience, it is because they care more about themselves than about their customers. If they really cared about their customers they’d do stuff to make their experiences great – and that’s possible without even knowing anything formal about user experience. But because they don’t care about their customers, they will fail, as they should…

Is content or visual design most important to the user experience?

Content (or functionality) is ultimately what people visit a website, intranet or application for. So it’s really, really important to get that right. If the core of the product is bad, it isn’t going to work.

But the visual design is often the part that helps people to get to the content. If the layout is poor, the colours and contrast awful and the site looks like it was designed in 1995, that’s going to stop people from even trying.

So both are important, though if I ever had to choose, I’d go for great content.

Is your book on card sorting really going to be released in 2009?

Yes, by the time the conference is on, there should be real, printed books. 150-odd pages of card sorting goodness. I hear that it should be out around 28 April. Really. I promise.

Does Facebook actually offer a better user experience after the redesign?

That’s a really interesting question. I can only speak for myself, but the thing that struck me about the redesign is that all of a sudden Facebook feels like a different beast. It used to be a site where friends were, but also where there were events, and groups and silly apps. Now it just feels like twitter that you can reply to. It feels like they have done a complete turn-around on who they actually are.

So for me the experience is worse. I can get a better idea of what my friends are doing, but I do that via twitter. Now it’s much harder for me to experience groups, events and all the other things we used to do there. I’m definitely using it less.

Why are you speaking at a Philadelphia web conference organized by a company based in Denmark?

Because they rock! But really, their core business overlaps a lot with what I do. I’m interested in the content the conference offers and I think my experience offers a lot to the attendees. Plus I’ve never been to Philly, and travelling to new places is a wonderful learning experience.

Written by Stephen E. Arnold · Filed Under Conferences, Interview, News, Online (general), Technology | Comments Off on Exclusive Interview: Donna Spencer, Enterprise Systems Expert

Lou Rosenfeld on Content Architecture

April 15, 2009

Editor’s Note: The Boye 09 Conference in Philadelphia takes place the first week in May 2009, May 5 to May 7, 2009, to be precise. Attendees can choose from a number of special interest tracks. These include strategy and governance, Intranet, Web content management, SharePoint, user experience, and eHealth. You can get more information about this conference here. One of the featured speakers, is Lou Rosenfeld. You can get more information here. Janus Boye spoke with Mr. Rosenfeld on April 14, 2009. The full text of the interview appears below.

Why is it so hard for organizations to get a grip on user experience design?

Because UX is an interdisciplinary pursuit. In most organizations, the people who need to work together to develop good experiences–designers, developers, content authors, customer service personnel, business analysts, product managers, and more–currently work in separate silos. Bad idea. Worse, these people already have a hard time working together because they don’t speak the same language.

Once you get them all in the same place and help them to communicate better, they’ll figure out the rest.

Why is web analytics relevant when talking about user experience?

Web sites exist to achieve goals of some sort. UX people, for various reasons, rely on qualitative research methods to ensure their designs meet those goals. Conversely, Web analytics people rely on quantitative methods. Both are incomplete without the other – one helps you figure out what’s going on, the other why. UX and WA folks two more groups that need help communicating; I’m hoping my talk in some small way helps them see how they fit together.

Is your book “Information Architecture for the World Wide Web” still relevant 11 years later?

Nah, not the first edition from 1998. It was geared toward developing sites–and information architectures–from scratch. But the second edition, which came out in 2002, was almost a completely new book, much longer and geared toward tuning existing sites that were groaning under the weight of lots of content: good and bad, old and new. The third edition–which was more of a light update–came out in 2006. I don’t imagine information architecture will ever lose relevance as long as there’s content. In any case, O’Reilly has sold about 130,000 copies, so apparently they think our book is relevant.

Does Facebook actually offer a better user experience after the redesign?

I really don’t know. I used to find Facebook an excellent platform for playing Scrabble, but thanks to Hasbro’s legal department, the Facebook version of Scrabble has gone the way of all flesh. Actually, I think it’s back now, but I’ve gotten too busy to fall again to its temptation.

Sorry, that’s something of an underhanded swipe at Facebook. But now, as before, I find it too difficult to figure out. I have a hard time finding (and installing) applications that should be at my fingertips. I’m overwhelmed – and, sometimes, troubled–by all the notifications which seem to be at the core of the new design. I’d far prefer to keep up with people via Twitter (I’m @louisrosenfeld), which actually integrates quite elegantly with the other tools I already use to communicate, like my blog (http://louisrosenfeld.com) and email. But I’m the wrong person to ask. I’m not likely Facebook’s target audience. And frankly, my opinion here is worth what you paid for it. Much better to do even a lightweight user study to answer your question.

Why are you speaking at a Philadelphia web conference organized by a company based in Denmark?

Because they asked so nicely. And because I hope that someday they’ll bring me to their Danish event, so I can take my daughter to the original Legoland.

Janus Boye, April 15, 2009

Written by Stephen E. Arnold · Filed Under Conferences, Feature, News, Online (general) | 1 Comment

Bob Boiko, Exclusive Interview

April 9, 2009

The J Boye Conference will be held in Philadelphia, May 5 to May 7, 2009. Attendees can choose from a number of special interest tracks. These include strategy and governance, Intranet, Web content management, SharePoint, user experience, and eHealth. You can get more information about this conference here.

One of the featured speakers, is Bob Boiko, author of Laughing at the CIO and a senior lecturer at the University of Washington iSchool. Peter Sejersen spoke with Mr. Boiko about the upcoming conference and information management today.

Why is it better to talk about “Information Management” than “Content Management”?

Content is just one kind of information. Document management, records management, asset management and a host of other “managements” including data management all deal with other worthy forms of information. While the objects differ between managements (CM has content items, DM has file, and so on) the principles are the same. So why not unite as a discipline around information rather than fracture because you call them records and I call them assets?

Who should be responsible for the information management in the organization?

That’s a hard question to answer outside of a particular organizational context. I can’t tell you who should manage information in *your* organization. But it seems to me in general that we already have *Information* Technology groups and Chief *Information* Officers, so they would be a good place to start. The real question is are the people with the titles ready to really embrace the full spectrum of activities that their titles imply

What is your best advice to people working with information management?

Again, advice has to vary with the context. I’ve never found two organizations that needed the same specific advice. However, we can all benefit from this simple idea. If, as we all seem to believe, information has value, then our first requirement must be to find that value and figure out how to quantify it in terms of both user information needs and organizational goals. Only then should we go on to building systems that move information from source to destination because only then will we know what the right sources and destinations are.

Your book “Laughing at the CIO” has a catchy title, but have you ever laughed at you CIO yourself?

I don’t actually. But it is always amazing to me how many nervous (and not so nervous) snickers I hear when I say the title. The sad fact is that a lot of the people I interact with don’t see their leadership as relevant. Many (but definitely not all) IT leaders forget or never knew that there is an I to be lead as well as a T. It’s not malicious, it has just never been their focus. I gave the book that title in an attempt to make it less ignorable to IT leaders. Once a leader (or would be leader) picks the book up, I hope it helps them build a base of strength and power based on the strategic use of information as well as technology.

Why are you speaking at a Philadelphia web conference organized by a company based in Denmark?

Janus and his crew are dynamite organizers. They know how to make a conference much more than a series of speeches. They have been connecting professionals and leaders with each other and with global talent for a long time. Those Danes get it and they know how to get you to get it too.

Peter Sejersen, J Boye. April 9, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, Conferences, Enterprise, Feature, Interview, Online (general), Technology | Comments Off on Bob Boiko, Exclusive Interview

Exclusive Interview with David Milward, CTO, Linguamatics

February 16, 2009

Stephen Arnold and Harry Collier interviewed David Milward,the chief technical officer of Linguamatics, on February 12, 2009. Mr. Milward will be one of the featured speakers at the April 2009 Boston Search Engine Meeting. You will find minimal search “fluff” at this important conference. The focus is upon search, information retrieval, and content processing. You will find no trade show booths staffed, no multi-track programs that distract, and no search engine optimization sessions. The Boston Search Engine Meeting is focused on substance from informed experts. More information about the premier search conference is here. Register now.

The full text of the interview with David Milward appears below:

Will you describe briefly your company and its search / content processing technology?

Linguamatics’ goal is to enable our customers to obtain intelligent answers from text – not just lists of documents. We’ve developed agile natural language processing (NLP)-based technology that supports meaning-based querying of very large datasets. Results are delivered as relevant, structured facts and relationships about entities, concepts and sentiment.
Linguamatics’ main focus is solving knowledge discovery problems faced by pharma/biotech organizations. Decision-makers need answers to a diverse range of questions from text, both published literature and in-house sources. Our I2E semantic knowledge discovery platform effectively treats that unstructured and semi-structured text as a structured, context-specific database they can query to enable decision support.

Linguamatics was founded in 2001, is headquartered in Cambridge, UK with US operations in Boston, MA. The company is privately owned, profitable and growing, with I2E deployed at most top-10 pharmaceutical companies.

What are the three major challenges you see in search / content processing in 2009?

The obvious challenges I see include:

The ability to query across diverse high volume data sources, integrating external literature with in-house content. The latter content may be stored in collaborative environments such as SharePoint, and in a variety of formats including Word and PDF, as well as semi-structured XML.
The need for easy and affordable access to comprehensive content such as scientific publications, and being able to plug content into a single interface.
The demand by smaller companies for hosted solutions.

With search / content processing decades old, what have been the principal barriers to resolving these challenges in the past?

People have traditionally been able to do simple querying across multiple data sources, but there has been an integration challenge in combining different data formats, and typically the rich structure of the text or document has been lost when moving between formats.

Publishers have tended to develop their own tools to support access to their proprietary data. There is now much more recognition of the need for flexibility to apply best of breed text mining to all available content.

Potential users were reluctant to trust hosted services when queries are business- sensitive. However, hosting is becoming more common, and a considerable amount of external search is already happening using Google and, in the case of life science researchers, PubMed.

What is your approach to problem solving in search and content processing?

Our approach encompasses all of the above. We want to bring the power of NLP-based text mining to users across the enterprise – not just the information specialists. As such we’re bridging the divide between domain-specific, curated databases and search, by providing querying in context. You can query diverse unstructured and semi-structured content sources, and plug in terminologies and ontologies to give the context. The results of a query are not just documents, but structured relationships which can be used for further data mining and analysis.

Multi core processors provide significant performance boosts. But search / content processing often faces bottlenecks and latency in indexing and query processing. What’s your view on the performance of your system or systems with which you are familiar?

Our customers want scalability across the board – both in terms of the size of the document repositories that can be queried and also appropriate querying performance. The hardware does need to be compatible with the task. However, our software is designed to give valuable results even on relatively small machines.

People can have an insatiable demand for finding answers to questions – and we typically find that customers quickly want to scale to more documents, harder questions, and more users. So any text mining platform needs to be both flexible and scalable to support evolving discovery needs and maintain performance. In terms of performance, raw CPU speed is sometimes less of an issue than network bandwidth especially at peak times in global organizations.

Information governance is gaining importance. Search / content processing is becoming part of eDiscovery or internal audit procedures. What’s your view of the the role of search / content processing technology in these specialized sectors?

Implementing a proactive e-Discovery capability rather than reacting to issues when they arrive is becoming a strategy to minimize potential legal costs. The forensic abilities of text mining are highly applicable to this area and have an increasing role to play in both eDiscovery and auditing. In particular, the ability to search for meaning and to detect even weak signals connecting information from different sources, along with provenance, is key.

As you look forward, what are some new features / issues that you think will become more important in 2009? Where do you see a major break-through over the next 36 months?

Organizations are still challenged to maximize the value of what is already known – both in internal documents or in published literature, on blogs, and so on. Even in global companies, text mining is not yet seen as a standard capability, though search engines are ubiquitous. This is changing and I expect text mining to be increasingly regarded as best practice for a wide range of decision support tasks. We also see increasing requirements for text mining to become more embedded in employees’ workflows, including integration with collaboration tools.

Graphical interfaces and portals (now called composite applications) are making a comeback. Semantic technology can make point and click interfaces more useful. What other uses of semantic technology do you see gaining significance in 2009? What semantic considerations do you bring to your product and research activities?

Customers recognize the value of linking entities and concepts via semantic identifiers. There’s effectively a semantic engine at the heart of I2E and so semantic knowledge discovery is core to what we do. I2E is also often used for data-driven discovery of synonyms, and association of these with appropriate concept identifiers.

In the life science domain commonly used identifiers such as gene ids already exist. However, a more comprehensive identification of all types of entities and relationships via semantic web style URIs could still be very valuable.

Where can I find more information about your products, services, and research?

Please contact Susan LeBeau (susan.lebeau@linguamatics.com, tel: +1 774 571 1117) and visit www.linguamatics.com.

Stephen Arnold (ArnoldIT.com) and Harry Collier (Infonortics, Ltd.), February 16, 2009

Written by Stephen E. Arnold · Filed Under Conferences, Interview, News, Search, Semantic, Technology, Text processing | Comments Off on Exclusive Interview with David Milward, CTO, Linguamatics

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Vivisimo Now at Version 8 of Velocity

ZL Systems and TREC

Vamosa and SchemaLogic

Microsoft Document and Records Management

SchemaLogic and Its MetaPoint

MarkLogic: The Shift Beyond Search

Exclusive Interview: Donna Spencer, Enterprise Systems Expert

Lou Rosenfeld on Content Architecture

Bob Boiko, Exclusive Interview

Exclusive Interview with David Milward, CTO, Linguamatics

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta