Recorded Future in the Spotlight: An Interview with Christopher Ahlberg

April 5, 2011

It is big news when In-Q-Tel, the investment arm of the US intelligence community, funds a company. It is really big news when Google funds a company. But when both of these tech-savvy organizations fund a company, Beyond Search has to take notice.

After some floundering around, ArnoldIT was able to secure a one-on-one interview with the founder of Recorded Future. The company is one of the next-generation cloud-centric analytics firms. What sets the company apart technically is, of course, the magnetism that pulled In-Q-Tel and Google to the Boston-based firm.

Mr. Ahlberg, one of the founders of Spotfire which was acquired by the hyper-smart TIBCO organization, has turned his attention to Web content and predictions. Using sophisticated numerical recipes, Recorded Future can make observations about trends. This is not fortune telling, but mathematics talking.

In my interview with Mr. Ahlberg, he said:

We set out to organize unstructured information at very large scale by events and time. A query might return a link to a document that says something like “Hu Jintao will tomorrow land in Paris for talks with Sarkozy” or “Apple will next week hold a product launch event in San Francisco”). We wanted to take this information and make insights available through a stunning user experiences and application programming interfaces. Our idea was that an API would allow others to tap into the richness and potential of Internet content in a new way.

When I probed for an example, he told me:

What we do is to tag information very, very carefully. For example, we add metatags that make explicit when we locate an item of data. We tag when that datum was published. We tag when we analyzed that datum. We also tag when we find it, when it was published, when we analyzed it, and what actual time point (past, present, future) to which the datum refers. The time precision is quite important. Time makes it possible for end users and modelers to deal with this important attribute. At this stage in our technology’s capabilities, we’re not trying to claim that we can beat someone like Reuters or Bloomberg at delivering a piece of news the fastest. But if you’re interested in monitoring, for example, the co-incidence of an insider trade with a product recall we can probably beat most at that.

To read the full text of the interview with Mr. Ahlberg click here. The interview is part of the Search Wizards Speak collection of first person narratives about search and content processing. Available without charge on the ArnoldIT.com Web site, the more than 50 interviews comprise the largest repository of first hand explanations of “findability” available.

If you want your search or content processing company featured in this interview series, write seaky2000 at yahoo dot com.

Stephen E Arnold, April 5, 2011

Freebie

Linguamatics Takes to the Cloud

March 22, 2011

One of the leaders in enterprise text mining, Linguamatics, recently announced its newest software creation in “I2E OnDemand – Cloud (Online) Text Mining”.  The company’s flagship product, I2E, is an enterprise version of NLP-based text mining software, largely implemented in the medical and pharmaceutical industries.  Now Linguamatics adds I2E OnDemand to its offerings menu, matching the popular I2E capabilities with cloud computing for those companies with fewer resources stacked in their corners.

The write-up boasts:

“I2E OnDemand provides a cost-effective, accessible, high performance text mining capability to rapidly extract facts and relationships from the MEDLINE biomedical literature database, supporting business-critical decision making within your projects. MEDLINE is one of the most commonly accessed resources for research by the pharmaceutical and biotech industries.”

Of course in the event that search of additional data sources is required, it is possible to move to the enterprise version of I2E.  There is a trial version for evaluation, available by request from the website. Linguamatics has been diversifying in the last 12 months. In 2009, I characterized Linguamatics as a vendor with a product tailored to the needs of the pharma and medical sectors. Now Linguamatics appears to be making moves outside of these vertical sectors.

Sarah Rogers, March 22, 2011

Freebie

Rosette Linguistics Platform Releases Latest Version

March 10, 2011

Basis Technology has announced its most recent release if its Rosette Linguistics Platform. Rosette is the firm’s multilingual text analytics software. Among the features of the new release is the addition of Finnish, Hebrew, Thai, and Turkish to the system’s 24 language capability. One point that we noted is that this release of Rosette sports an interesting mix of compatible search engines. According to the Basis Tech announcement:

“Bundled connectors enable applications built with Apache Lucene, Apache Solr, dtSearch Text Retrieval Engine, and LucidWorks Enterprise to incorporate advanced linguistic capabilities, including document language identification, multilingual search, entity extraction, and entity resolution.”

Several observations seem warranted. First, Basis Tech is moving beyond providing linguistic functionality. The company is pushing into text analytics and search. Second, Basis Tech is supporting commercial and open source search systems; namely, the SharePoint centric dtSearch and the Lucid Imagination’s open source solution.

The question becomes, “What is the business trajectory of Basis Tech? Will it become a competitor to the vendors with which the company has worked for many years? Will it morph into a new type of linguistic-centric analytics firm?” Stay tuned.

Cynthia Murrell, March 10, 2011

Freebie

Automated Understanding: Digital Reasoning Cracks the Information Maze

March 4, 2011

I learned from one reader that the presentation by Tim Estes, the founder of Digital Reasoning, caused some positive buzz at a recent conference on the west coast. According to my source, this was a US government sponsored event focused on where content processing was going. The surprise was that as other presenters talked about the future, a company called Digital Reasoning displayed a next generation system. Keep in mind that i2 Ltd. is a solid analyst’s tool with technology roots that stretch back 15 years. (I did some work for the founder of i2 a few years ago and have a great appreciation for the case value of the system for law enforcement.) Palantir has some useful visualization tools, but the company continues to attract attention from litigation and brushes with outfits with some interesting sales practices. Beyond Search covered this story here and here.

dr solving the maze copy

ArnoldIT.com sees Digital Reasoning’s Synthesys as solving difficult information puzzles quickly and efficiently because it eliminates most of the false path or trial-and-error of traditional systems. Solving the information maze of real world flows is now possible in our view.

The shift was from semi-useful predictive numerical recipes and overlays or augmented outputs to something quite new and different. The Digital Reasoning presentation focused on real data and what the company called “automated understanding.”

For a few bucks last year, one of my colleagues and I got a look at the automated understanding approach of the Synthesys 3 platform. Tim Estes explained that real data poses major challenges to systems that lack an ability to process large flows, discern nuances, and apply what Mr. Estes described as “entity oriented analytics.”

Our take at ArnoldIT.com is that Digital Reasoning moves “beyond search” in a meaningful way. The key points we recall from our briefing was the a modular approach eliminates the need for a massive infrastructure build and the analytics reflect what is happening in a real time flow of unstructured information. My personal view is that historical research is best served by key word systems. The more advanced methods deliver actionable information and better decisions by focusing on the vast amounts of “now” data. A single Twitter message can be important. A meaningful analysis of a flow of Twitter messages moves insight to the next level.

Read more

Attivo Unveils Maturity Model

March 4, 2011

Our aggregators returned this interesting piece to us from PR Newswire: “Attivio Releases Maturity Model for Unified Information Access.” Attivo has released a series of whitepapers detailing the benefits of using unified information access (UIA). The purpose of UIA is to help businesses see how using information access technologies can increase revenue, cut costs, and increase customer satisfaction for long term strategic planning. Using the UIA model, businesses can learn new ways about data integration. Attivio said:

“The objective of the model is to help organizations establish, benchmark, and improve information access and management strategies. According to the report, the first step in developing a plan for implementing UIA is to conduct a self-assessment of current capabilities and needs, then determine the urgency and importance of solving each issue identified.  As an organization moves into the next stage, the incremental capabilities and benefits are measured across two vectors – business criticality and information management integration and process improvements.”

The UIA model can be used by any business to improve their information assets and overall practices.

Attivio is a technology firm that offers functions and systems that push beyond keyword search and retrieval.

Whitney Grace, March 4, 2011

Freebie

Attensity Goes on a Social Offensive

March 3, 2011

Remember the pigeons from Psych 101.

Beginning with the discoveries made by Pavlov and his dogs to the direct application of the science by Ogilvy and his Madison Ave. minions, psychology has long played a part in shaping us as consumers.

Now it seems the growing worldwide embrace of Social media has altered one more aspect of our lives, how we are marketed to, or to phrase it more accurately, how we have begun to market ourselves.

Attensity’s  “Customer Segmentation for the Social Media Age“, (which the Attensity writer admits was inspired by a series of tweets) delves into the new media ramifications on conventional segmentation practices.

Attensity explains that before the technological advances made over the last three decades, gathering the information necessary to construct effective marketing campaigns consumed both substantial amounts of time and capital. Despite these costs,

” … Segmentation was the best attempt that we as marketers had to give our customers what they needed, …”

What has changed?

The buyer’s willingness, nay their seeming compulsion to share every fleeting thought and scrap of personal information about themselves to anyone clever enough to operate one of the many devices that link us to the web. The new breed of admen now, instead of sorting through pounds of trial results and customer surveys, can as Attensity states:

” … scour the social web to find mentions of our brands, our competitors’ brands and product categories.”

An interesting read and something to think about the next time you feel the urge to “friend” your laundry detergent.

In a related post on the Parisian consulting and technology firm Capgemini’s site, Senior Consultant Jude Umeh discusses the melding of social media surveillance with the review, application and management of the collected data. His perspective is informed by the hands on experience he received at a partner training session organized by Attensity.

Attensity is collaborating with Pega, a firm offering business process management and customer relationship management software. BPM and CRM are factors in the new math of modern marketing, Attensity seems to have discovered the formula that will position the collective at the head of pack.

Layering their respective technologies, the group appears poised to revolutionize the way information is gleaned from new media. Can Attensity pull off a home run with every search and content processing vendor “discovering” these market sectors? We do not know.

Michael Cory, March 3, 2011

Freebie

Meaning in Casual Wording

March 3, 2011

I love science.  Paired with my increasing passion for language and grammar, a sweeter cocktail could hardly be imagined.  “Do Casual Words Betray Warlike Intent?” was a fascinating read.

At the recent American Association for the Advancement of Science (AAAS) meeting, James Pennebaker, a University of Texas at Austin psychologist spoke about the study he and assorted colleagues along with the Department of Homeland Security have been engaged in recently.  The focus of the research has been on four similar Islamic groups and the relationship between the speech they employ and the actions that follow.  The collective hope is the study’s findings can be used to forecast aggressive activity.

Isolating pronouns, determiners, adjectives and prepositions, the group mines them for what Pennebaker calls “linguistic shifts”.  To date they have determined that of the four, the two groups who have committed acts of violence, telegraphed said destructiveness with the use of “more personal pronouns, and words with social meaning or which convey positive or negative emotions.”  Aside from differentiating between various stylistic elements of expression, Pennebaker has also scrutinized statements made by warmongers from our past, including George W. Bush, with interesting results.

Skepticism has always fueled scientific endeavors, and we must continue to ask questions, especially those that breed discomfort.  This science deals with a very grey area and Pennebaker himself labels the results as only “modest probabilistic predictions”.  There is no question that this information must be used responsibly, but my aforementioned appreciation for the field keeps me from seeing this as a negative.

If one can discern an opponent’s intent in a fight or a game of cards by careful observation, why is it so strange to think the same could be done from listening to what they say?

Sarah Rogers, March 3, 2011

Freebie

Search and Virtualization

March 1, 2011

Quick. What enterprise search vendors’ systems permit virtualization? The answer is that the marketing professional from any search firm will say, “We do.” However, the technology professional who rarely speaks to customers will say, “Well, that is an interesting question.”

Virtualization is turning big honking servers into lots of individual machines or servers. Virtualization is easy to talk about as search vendors tout their systems’ capabilities as business intelligence services. But in our experience remains both science and art. Another way to describe virtualization and search is “research project.”

Our contributing writer Sarah Rogers reports:

The commercial climate for virtualization is changing.  Business intelligence (BI) represents just one force exerting its influence.  As the needs of numerous businesses reach levels where accessing, housing and reviewing information are yesterday’s problems, the new focus becomes how to maximize efficiency without renting secondary office space to handle the servers required.  Many are turning to virtualization.

But virtualization isn’t all perks, as examined in “Are SQL Server BI systems compatible with virtualization?”.  Systems operating under the BI umbrella will not always function at full capacity when connected to an incorporeal network.  Contemporary BI groups construct detail heavy examination patterns inside existing memory as you need it.  These analytical systems often are designed to retain vast amounts of data, which when operating through a virtualized platform can breed obstacles in the path to access. Another issue is what is described as over commitment, where hosts ration out available memory to all those connected.  A fine idea, though again analytical systems may overload the designated operating pattern and diminish results.

Though traditional databases are suited to disambiguate these compatibility issues, they seem to be struggling, awash in the flood of their in-memory counterparts. At least that is one opinion floating about.  It is clear that other variables exist that will spoil the math when looking to pass through to the other side.  So here is another opinion: the physical database does still have a viable roll.  Why not keep your options open?

Sarah Rogers, March 1, 2011

Freebie

Data Mining Tactics: Palantir and Friends

February 21, 2011

Here in Harrod’s Creek, life is simple. We have one road, a store, and a pond. Elsewhere, there are machinations that simple folks like me and the goslings have difficulty understanding this type of pitch. I noticed an impassioned blog post from Craft Is Cranium here.  Then we saw the Register’s write up “HBGary Quails in the Face of Anonymous.” As I understand the issue, experts working in the commercial side of intelligence saw Wikileaks as a business opportunity. The experts did not want to sell their technology to Wikileaks. The experts wanted to get the US government to pay the experts to nibble away at Wikileaks. The assumption was that Wikileaks was a security challenge and could be sanded down or caged using various advanced technologies. A good example is the thread on Quora.com “Why Would Palantir Go after WikiLeaks?”

The Quora answers are interesting, and as you might imagine, different from what folks in Harrod’s Creek might suggest. First, there is a link to some interesting article titled “Data Intelligence Firms Proposed a Systematic Attack against WikiLeaks.” It is difficult to determine what is accurate and what is information shaping, but what is presented is interesting.

Second, one answer struck me as pure MBA. The proposal to nibble on Wikileaks’ toes was summarized this way:

For money. It’s a pitch deck targeted towards the concerns of governmental and financial institutions.

Third, there is a paraphrase of the specific motive for floating this trial balloon:

“You [the US government] have to respond to Wikileaks immediately, by giving us massive amounts of money for our software and consulting services. You cannot wait to write us a massive blank check, because the threat of Wikileaks is too great.”

What I find interesting is that the sharp edges of the Palantir-type approach may create some problems for search companies now venturing into “business intelligence.” My view is that enterprise search marketers are often crafted with memory foam and rounded edges. The Palantir type approach seems to be elbows and sharp fingernails.

Quite a few search vendors want to play in the “intelligence” sector. I am not sure that technology will win out over attitude and aggressiveness. Palantir, as you may recall, was engaged last year in a legal spat with i2 Ltd., another foundation company in certain intelligence sectors. Incumbents may eat the softer newcomers the way a goose gobbles bread crumbs.

Stephen E Arnold, February 21, 2011

Freebie

Exclusive Interview: Abe Music, Digital Reasoning

February 16, 2011

Digital Reasoning, based in Franklin, Tennessee, is one of a handful of companies breaking a path through the content jungle. The firm’s approach processes a wide range of “big data”. The system’s proprietary methods make it easy to discern trends, identify high-value items of data, and see the relationships among people, places, and things otherwise lost in the “noise” of digital information.

In addition to a number of high-profile customers in the defense and intelligence communities, the company is attracting interest from healthcare and financial institutions. Also, professionals engaged in eDiscovery, and practitioners in competitive intelligence are expressing interest in the company’s approach to “big data”. The idea of “big data” is large volumes of structured and unstructured content such as Twitter messages, Web logs, reports, email messages, blog data and system generated numerical outputs is increasingly important. The problem is that the content arrives continuously and in ever increasing volume.

Digital Reasoning has created a system and an interface that converts a nearly impossible reading task into reports, displays, and graphics that eliminate the drudgery and the normal process of looking at only a part of a very large collection of content. Their flagship product, Synthesys® essentially converts “big data” into the underlying facts, connections and associations making it possible to understand large scale data by examining facts instead of reading first.

I spoke with senior software engineer, Abe Music about Digital Reasoning’s approach and the firm’s activities in the open source community. Like some other next-generation analytics companies, Digital Reasoning makes use of open source software in order to reduce development time and introduce a standards-based approach into the firm’s innovative technology.

The full text of my interview with Abe Music appears below.

When did you first start following open source software?

I originally began learning about open-source software while in college. At Western Kentucky University we had a very prominent Linux users group that advocated open-source software wherever possible. This continued throughout my college career in any project that would allow it and after, where in my first job out of school, Python was the language of choice.

How does Digital Reasoning create a contribution to Open Source community through github?

Currently, PyStratus is the only contribution through github although more contributions are underway.

What is github?

Good question. github is a Web-based hosting service for open source software projects that use a revision control system. github offers both commercial plans and free accounts for open source projects, and it is a key community resource for the open source developers.

What is PyStratus?

Here at Digital Reasoning, we were using a set of Python scripts from Cloudera’s Hadoop distribution to manage our Hadoop clusters in the cloud.

Soon after, we had the need to easily manage our Cassandra clusters as well. We decided to leverage the work Cloudera had already done by converting the Cloudera Distribution of Hadoop or CDH scripts into an all-in-one solution for managing Hadoop, Cassandra and hybrid Hadoop/Cassandra clusters.

For us, we did a complete refactoring of the CDH scripts into an easily extensible Python framework for managing our services in the cloud.

What’s “refactoring”?

“Refactoring” to me is the process of changing a computer program’s source code without modifying its external functional behavior. Here at Digital Reasoning, when we refactor were are improving some of the attributes of the software such as performance or resource consumption, etc.

Thank you. Why are some firms supporting open source software?

I personally don’t see any downside to open-source software, but, of course, I am quite biased.

I can see, from the business side, a reason to stay closed if you had developed your business around some intellectual property that you wanted to control.

But I believe that open-source software really fills a void in the tech community because it allows anyone to take the software and extend it to fit their individual requirements without having to reinvent the wheel.

I also think it is important to use open-source software as a reference to learn some new technology or algorithm.

Personally I think that working with open source software is a great way to learn and I would recommend anyone writing code to consider using open source as a way to add to their personal coding knowledge base.

What are the advantages of tapping into the open source software trend that seems to be building?

One of the major advantages I see from using open-source software is that it makes possible taking some outstanding work from a community of developers. With open source software, I can put software to work immediately without much effort.

As a developer leveraging that technology — and not developing it yourself — you get the added benefit of very minimal maintenance on that piece of your software. If there is a bug, the community taps the collective pool of expertise. When someone adds to a project, everyone can take advantage of that innovation. The advantages of this approach range from greater reliability or a more rapid pace for innovation.

And I would definitely recommend giving back to the community wherever possible.

When you want to use open source software, what is your process for testing and determining what you can do with a particular library or component?

That’s a very good question. This is my favorite part actually.

Because there are so many great open-source technologies out there I get to play with all of them when considering which component(s) to use. I don’t have a particular process that I use to evaluate the software. I have a clear idea of what I need out of the component before I begin the evaluation. If there are similar components I will try to match each of them up to one another and determine which one fits my requirements the best.

Is this work or play? You seem quite enthusiastic about what strikes me as very complicated technical work?

To be candid, I find exploring, learning, and building enjoyable. I can’t speak for the other technologists at Digital Reasoning, but I find this type of problem-solving and analytical work both fun and rewarding. Maybe “play” is not the right word, but I like the challenge of this type of engineering.

Quite a few companies are supporting open source, including IBM. in your view will more companies be developing with open source in mind?

Yes, I definitely believe that more and more companies will begin supporting the open-source community simply because of the vast amount of benefits they can gain.

As a strategic move to support open-source a company could easily reduce development costs by “outsourcing” development to a particular piece of community-supported technology rather than developing it themselves.

The use of open source means that an organization not only get access to a piece of software that is not completely developed by them, but they also get to interface with some potential candidates for employment, contribute to fostering new ideas, and work within a community that is very passionate about what they are contributing to.

What next for Digital Reasoning and open source?

Our commitment to open source is strong. We have a number of ideas about projects. Look for further announcements in the future.

How can a person get more information about Digital Reasoning?

Our Web site is www.digitalreasoning.com. I know that you have interviewed our founder, Tim Estes, on two separate occasions, and there is a great deal of detailed information in those interviews as well. We have also recently announced Synthesys® Platform as a beta program allowing API access to our “big data” analytics with your data where we take complete responsibility for managing the cloud resources. More information about his new program can be found at http://dev.digitalreasoning.com.

Beyond Search Comment

A number of companies have embraced open source software. In an era of big data, Digital Reasoning has identified open source technology that helps cope with the challenges of peta-scale flows of structured and unstructured content. The firm’s new version of its flagship Synthesys service delivers blistering performance and easy-to-understand outputs in near-real time. Open source software has influenced Digital Reasoning and Digital Reasoning’s contribution to the open source community helps make useful technical innovations available to other developers.

Our view is that Digital Reasoning is taking a solid engineering approach to service its customers.

Stephen E Arnold, January 12, 2011

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta