Exclusive Interview: Abe Music, Digital Reasoning

February 16, 2011

Digital Reasoning, based in Franklin, Tennessee, is one of a handful of companies breaking a path through the content jungle. The firm’s approach processes a wide range of “big data”. The system’s proprietary methods make it easy to discern trends, identify high-value items of data, and see the relationships among people, places, and things otherwise lost in the “noise” of digital information.

In addition to a number of high-profile customers in the defense and intelligence communities, the company is attracting interest from healthcare and financial institutions. Also, professionals engaged in eDiscovery, and practitioners in competitive intelligence are expressing interest in the company’s approach to “big data”. The idea of “big data” is large volumes of structured and unstructured content such as Twitter messages, Web logs, reports, email messages, blog data and system generated numerical outputs is increasingly important. The problem is that the content arrives continuously and in ever increasing volume.

Digital Reasoning has created a system and an interface that converts a nearly impossible reading task into reports, displays, and graphics that eliminate the drudgery and the normal process of looking at only a part of a very large collection of content. Their flagship product, Synthesys® essentially converts “big data” into the underlying facts, connections and associations making it possible to understand large scale data by examining facts instead of reading first.

I spoke with senior software engineer, Abe Music about Digital Reasoning’s approach and the firm’s activities in the open source community. Like some other next-generation analytics companies, Digital Reasoning makes use of open source software in order to reduce development time and introduce a standards-based approach into the firm’s innovative technology.

The full text of my interview with Abe Music appears below.

When did you first start following open source software?

I originally began learning about open-source software while in college. At Western Kentucky University we had a very prominent Linux users group that advocated open-source software wherever possible. This continued throughout my college career in any project that would allow it and after, where in my first job out of school, Python was the language of choice.

How does Digital Reasoning create a contribution to Open Source community through github?

Currently, PyStratus is the only contribution through github although more contributions are underway.

What is github?

Good question. github is a Web-based hosting service for open source software projects that use a revision control system. github offers both commercial plans and free accounts for open source projects, and it is a key community resource for the open source developers.

What is PyStratus?

Here at Digital Reasoning, we were using a set of Python scripts from Cloudera’s Hadoop distribution to manage our Hadoop clusters in the cloud.

Soon after, we had the need to easily manage our Cassandra clusters as well. We decided to leverage the work Cloudera had already done by converting the Cloudera Distribution of Hadoop or CDH scripts into an all-in-one solution for managing Hadoop, Cassandra and hybrid Hadoop/Cassandra clusters.

For us, we did a complete refactoring of the CDH scripts into an easily extensible Python framework for managing our services in the cloud.

What’s “refactoring”?

“Refactoring” to me is the process of changing a computer program’s source code without modifying its external functional behavior. Here at Digital Reasoning, when we refactor were are improving some of the attributes of the software such as performance or resource consumption, etc.

Thank you. Why are some firms supporting open source software?

I personally don’t see any downside to open-source software, but, of course, I am quite biased.

I can see, from the business side, a reason to stay closed if you had developed your business around some intellectual property that you wanted to control.

But I believe that open-source software really fills a void in the tech community because it allows anyone to take the software and extend it to fit their individual requirements without having to reinvent the wheel.

I also think it is important to use open-source software as a reference to learn some new technology or algorithm.

Personally I think that working with open source software is a great way to learn and I would recommend anyone writing code to consider using open source as a way to add to their personal coding knowledge base.

What are the advantages of tapping into the open source software trend that seems to be building?

One of the major advantages I see from using open-source software is that it makes possible taking some outstanding work from a community of developers. With open source software, I can put software to work immediately without much effort.

As a developer leveraging that technology — and not developing it yourself — you get the added benefit of very minimal maintenance on that piece of your software. If there is a bug, the community taps the collective pool of expertise. When someone adds to a project, everyone can take advantage of that innovation. The advantages of this approach range from greater reliability or a more rapid pace for innovation.

And I would definitely recommend giving back to the community wherever possible.

When you want to use open source software, what is your process for testing and determining what you can do with a particular library or component?

That’s a very good question. This is my favorite part actually.

Because there are so many great open-source technologies out there I get to play with all of them when considering which component(s) to use. I don’t have a particular process that I use to evaluate the software. I have a clear idea of what I need out of the component before I begin the evaluation. If there are similar components I will try to match each of them up to one another and determine which one fits my requirements the best.

Is this work or play? You seem quite enthusiastic about what strikes me as very complicated technical work?

To be candid, I find exploring, learning, and building enjoyable. I can’t speak for the other technologists at Digital Reasoning, but I find this type of problem-solving and analytical work both fun and rewarding. Maybe “play” is not the right word, but I like the challenge of this type of engineering.

Quite a few companies are supporting open source, including IBM. in your view will more companies be developing with open source in mind?

Yes, I definitely believe that more and more companies will begin supporting the open-source community simply because of the vast amount of benefits they can gain.

As a strategic move to support open-source a company could easily reduce development costs by “outsourcing” development to a particular piece of community-supported technology rather than developing it themselves.

The use of open source means that an organization not only get access to a piece of software that is not completely developed by them, but they also get to interface with some potential candidates for employment, contribute to fostering new ideas, and work within a community that is very passionate about what they are contributing to.

What next for Digital Reasoning and open source?

Our commitment to open source is strong. We have a number of ideas about projects. Look for further announcements in the future.

How can a person get more information about Digital Reasoning?

Our Web site is www.digitalreasoning.com. I know that you have interviewed our founder, Tim Estes, on two separate occasions, and there is a great deal of detailed information in those interviews as well. We have also recently announced Synthesys® Platform as a beta program allowing API access to our “big data” analytics with your data where we take complete responsibility for managing the cloud resources. More information about his new program can be found at http://dev.digitalreasoning.com.

Beyond Search Comment

A number of companies have embraced open source software. In an era of big data, Digital Reasoning has identified open source technology that helps cope with the challenges of peta-scale flows of structured and unstructured content. The firm’s new version of its flagship Synthesys service delivers blistering performance and easy-to-understand outputs in near-real time. Open source software has influenced Digital Reasoning and Digital Reasoning’s contribution to the open source community helps make useful technical innovations available to other developers.

Our view is that Digital Reasoning is taking a solid engineering approach to service its customers.

Stephen E Arnold, January 12, 2011

OpenText Opens Advanced Content Analytics Market

February 14, 2011

Following in the footsteps of other vendors, Open Text has opened an advanced content analytics market.

OpenText Licensing Agreement Brings Advanced Content Analytics to Market” reveals a tie up between OpenText and the National Research Council (Canada). The idea is that new Content Analytics innovations will be added to the ECM Suite and made available by spring 2011. The added content analytics to the ECM Suite will improve data mining and analysis. The key point is:

“Content analytics is the key to extracting business value from social media and text-rich online and enterprise information sources, an essential technology for marketing, online commerce, customer service, and improved search and Web experience. Given the mind-boggling growth in information volumes, no wonder uptake is booming, powered by rapid technical advances from leading-edge vendors such as OpenText.”

Content Analytics will perform data mining that will uncover and show relationships between businesses and other facts. It will be able to find information that a normal search engine wouldn’t find. This agreement is the beginning for OpenText to apply Content Analytics to all its enterprise content management Suite products.

Whitney Grace, February 14, 2011

Freebie

Textalyser Highlighted on Podcast

February 9, 2011

Text analysis was mentioned by the podcast No Agenda, which is hosted by Adam Curry (professional broadcast journalist) and John C. Dovorak (technical and business columnist). The No Agenda podcast team runs certain text through Textalyzer and uses the output to identify “memes”; that is, words or phrases designed to be magnetic and persist in a conversation.

You can give Textalyser, the tool No Agenda mentioned, by navigating to http://textalyser.net/. There are two Web accessible modes. First, you can take a chunk of text and paste it into the Analysis Box on the Web page. The system will generate a report. Shown below, is a portion of the Textalyser report for one of my 2010 for fee columns.

textalyser 01

The report generates a word frequency report, word length summary, and two, three, and four word phrase frequency reports.

The service carries this identification notice: V 1.05 help Traduction Nieruchomo?ci Magazine interactif Umarex Airsoft + Paintball. For more information about the service, you can navigate to this link and leave a message.

Stephen E Arnold, February 9, 2011

Freebie

Synthesys Platform Beta Available

February 7, 2011

Digital Reasoning alerted us last week that a new beta program for the Synthesys Platform is available. Digital Reasoning has emerged as one of “the leader in complex, large scale unstructured data analytics.” The Synthesys platform is one of the “leaders in complex, large scale unstructured data analytics.” We have interviewed the founder of Digital Reasoning in our Search Wizards Speak series. These interviews are available on ArnoldIT.com’s Search Wizards Speak series here and here. Digital Reasoning is one of the leaders in making next-generation analytics available via the cloud, on premises, and hybrid methods.

image

© Digital Reasoning, 2011

This platform version of Digital Reasoning’s software will provide beta users immediate API-level access to the firm’s analytics software and access to tools that will be added through the beta program.

Matthew Russell, vice president of engineering at Digital Reasoning said:

We are excited to introduce Synthesys Platform to the market. By allowing users to upload their data into the cloud for analysis, many more users will get the opportunity to experience next generation data analytics while exploring their own data.

Digital Reasoning Systems (www.digitalreasoning.com) solves the problem of information overload by providing the tools people need to understand relationships between entities in vast amounts of unstructured and structured data.

Digital Reasoning builds data analytic solutions based on a distinctive mathematical approach to understanding natural language. The value of Digital Reasoning is not only the ability to leverage an organization’s existing knowledge base, but also to reveal critical hidden information and relationships that may not have been apparent during manual or other automated analytic efforts. Synthesys is a registered trademark of Digital Reasoning Systems, Inc.

Digital Reasoning will be exhibiting at the upcoming Strata Conference on February 28 and March 1, 2011. For more information about Digital Reasoning, navigate to the company’s Web site at www.digitalreasoning.com.

Stephen E Arnold, February 7, 2011

Juru, Watson, I Say, Juru!

February 1, 2011

Quite a heated discussion at lunch today. One of the goslings was raving about Watson. The Jeopardy demo convinced the engineer that IBM had the next big thing in search. A person can ask a question and right away get the answer. Wow. I thought that type of computer system only worked under carefully controlled conditions, in demos, or in motion pictures.

That’s why the goslings were agitated when I said, “It is TV. TV does almost anything—well, anything—for money.” I pointed out that the game shows 21 and the $64,000 Question took some liberties to boost ratings. Have TV times changed that much? I said, “I don’t think so.”

I supported my argument by mentioning Juru. Do you remember that gem from IBM. Here’s what my Overflight system spit out.

Juru is / was a full text search “library” that would make short work of “small and mid-sized corpuses.” Of course, “small” and “mid-sized” are rarely defined either by IBM or other search researchers. The idea was that Java made it easy to run Juru on any platform. Of course, today, I don’t think Juru would work in the Android or IOS environment, but some day maybe.

Juru asserted that the system would:

  • Support different document types
  • Make use of links just like our every tweakable PageRank-type systems
  • On the fly summaries of documents
  • Clustering
  • Nifty ways to keep the indexes small and, therefore, zippy.

You can get some info at this link. There is some additional color here:

I reminded the goslings that IBM rolls out search solutions as part of its global marketing efforts. More to the point, I asked the goslings which vendors’ search systems IBM resells. I did not hear the magic words Autonomy or Endeca. IBM once loved Fast ESP.

If you want search from IBM, what do you get today? A version of the open source search solution Lucene. Why? It works pretty well. Juru, Watson, Web Fountain, et al? Well, make up your own mind with some head to head testing. I won the argument and still had to pay for lunch. Honk.

Stephen E Arnold, February 1, 2011

Freebie

SharePoint Sharing from Attivio

January 19, 2011

Of interest to businesses overwhelmed with voluminous SharePoint content: “Attivio Announces AIE for SharePoint Integration” tells of the Active Intelligence Engine’s new availability to aggregate information across not only SharePoint, but also websites, databases, email, CRM and other information sources.

According to the announcement,

“The difficulty of discovering and delivering timely insight derived from all of these resources exposes gaps in an organization’s ability to integrate and rapidly update information; providing a single method for users to find the information they need, regardless of its origins.  AIE for SharePoint Integration enables secure access to all types of information by unifying diverse datasets, while avoiding the cost and delays of cumbersome legacy integration stages. “

With AIE companies no longer have to worry about SharePoint silos, easily accessible only within departments and can instead maximize insight and collaboration across the entire enterprise.  Attivio promises retention of data relationships from text sources, rapid implementation, and tight security to result in maximized competitive advantage. We believe the company has a good approach to a very tough SharePoint challenge.

Alice Wasielewski, January 19, 2011

Google and Local Search Commitment

January 17, 2011

Google re-loads and takes aim, this time at Facebook.  “Google’s Mobile Matchmaker” reports on an interview with Marissa Mayer, Google’s executive in charge of “local” products.  For Google “local” includes, maps, mobile, and even social activities.  “Contextual discovery,” giving automatic location-based information, or “search without search” as she calls it, is the basis of the way Mayer seeks to knock Facebook off its pedestal.  Google is working on taking the location information and adding social contextual information, such as showing a person in a restaurant the menu with annotations from friends or regular customers of the venue.  When asked if Google might work with Facebook on some of these social applications, Mayer demurred, citing Facebook’s closed nature versus Google’s support of the open web.  Instead, Mayer pointed to the Google social-esque alternatives such as Google Latitude, an application that follows the physical location of someone on a map.  The social implications seem obvious: “Once you tell Google who your friends are on Latitude, that same information might eventually be used for other services like socially marked-up menus, if you permitted it. The point is that Google may have more ways to acquire social information than just by building its own competing social network.”  My view is that Facebook may have reached its peak and is ripe for a serious alternative  The idea of friends’ LoJacking on Latitude doesn’t appeal to me, but then I’m not a FourSquare fan either.  Facebook, watch out, Google’s war is underway.

Alice Wasielewski, January 17, 2011

Freebie

2007 Semantic Search Info Still Relevant

January 13, 2011

Short honk. We had a long call today (January 12, 2011) about semantic search. In the course of the call, I mentioned a presentation by Jon Atle Gulla, a profession in 2007 at the Norwegian University of Science and Technology. I did some poking around and found the link to the presentation. Quite useful in 2007 and still germane today. The presentation puts into context some of the work that must be done to deploy an effective semantic technology system in an organization. The slide deck is on Slideshare at this link. Registration may be required to access the file.

Stephen E Arnold, January 13, 2011

Freebie

Attensity’s New Year’s Resolution

January 11, 2011

Attensity, now a multi-faceted technology management firm, has set a new course for itself this year in Making it Work in 2011!.  In the past it seems as though the company’s focus was increasingly on government contracts, as illustrated by the formation of the subsidiary Attensity Government Systems.  Well oh how “the times they are a changing.”  In a blog post on the company’s website in late December, buried beneath references to both classical music and reality television, the new direction is laid out.

Currently, a massive amount of data is generated by the surging wave of social networking sites and the new breed of citizen journalists.  Per Attensity:  “These days, competitors often have access to the same source material of customer conversations from Twitter, Facebook, blogs, forums, and review sites.  However, where the battle is truly won or lost is in how companies are able to harness and arrange that material, embellishing it with insights from their own internal survey and call center data, and transforming it into a symphony of action.”  So, Attensity’s new focus for the coming year is to improve their current menu, giving companies the option to act on multi-channel conversations.

It appears that like many companies, Attensity sees an opportunity in repackaging their services for broader consumption in an effort to cash in on the public’s embracing of these fresh and exciting technologies.  The same blog post gives a quick nod to the outgoing year’s poor economic makeup, though one is still left speculating if its main motive for the shift from its government affiliations to those of private consumers is to have a bountiful 2011. No problem with that.

Sarah Rogers, January 11, 2011

Freebie

Lexalytics and DataSift

December 31, 2010

If sentiment analysis is the key ingredient in the social content technology cocktail, then Lexalytics aims to be the brand of choice for businesses and individual consumers everywhere. MediaSift Ltd., the British company behind the Datasift social media filtering engine, is eager to see a partnership with the Lexalytics text analysis software take root.

We learned in DataSift Taps Lexalytics to Help “Tune Your Data”, that one focus of the alliance is the ever increasing accumulation of data generated from tweeting. “Lexalytics provides the ability to automatically extract companies, people or product names, without having a list of them ahead of time; the ability to calculate tweet, entity, and “linked-content” sentiment; output lists of positive/negative entities; and more.”

The Founder and CEO of Favorit Ltd., owner of Tweetmeme, a service designed to total all links and ascertain which are the most popular, is Nick Halstead. “An important part of the metrics we provide through Datasift is the sentiment, or tonality of the data. We needed an engine that could integrate quickly into our environment and start immediately providing accurate sentiment analysis across all our data services.” says Halstead. “Lexalytics Salience gives us a great combination of flexible integration, high performance and accurate sentiment analysis.”

Another goal of the union is to give users the tools to observe and respond in real-time. This is accomplished through the interpretation of massive amounts of data from a variety of online sources. The Lexalytics software possesses the capability of converting all English text and is compatible with multiple systems.  Looks like another player in social content technology is being added to the shaker.

Sarah Rogers, December 31, 2010

Freebie

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta