Search and Virtualization

March 1, 2011

Quick. What enterprise search vendors’ systems permit virtualization? The answer is that the marketing professional from any search firm will say, “We do.” However, the technology professional who rarely speaks to customers will say, “Well, that is an interesting question.”

Virtualization is turning big honking servers into lots of individual machines or servers. Virtualization is easy to talk about as search vendors tout their systems’ capabilities as business intelligence services. But in our experience remains both science and art. Another way to describe virtualization and search is “research project.”

Our contributing writer Sarah Rogers reports:

The commercial climate for virtualization is changing.  Business intelligence (BI) represents just one force exerting its influence.  As the needs of numerous businesses reach levels where accessing, housing and reviewing information are yesterday’s problems, the new focus becomes how to maximize efficiency without renting secondary office space to handle the servers required.  Many are turning to virtualization.

But virtualization isn’t all perks, as examined in “Are SQL Server BI systems compatible with virtualization?”.  Systems operating under the BI umbrella will not always function at full capacity when connected to an incorporeal network.  Contemporary BI groups construct detail heavy examination patterns inside existing memory as you need it.  These analytical systems often are designed to retain vast amounts of data, which when operating through a virtualized platform can breed obstacles in the path to access. Another issue is what is described as over commitment, where hosts ration out available memory to all those connected.  A fine idea, though again analytical systems may overload the designated operating pattern and diminish results.

Though traditional databases are suited to disambiguate these compatibility issues, they seem to be struggling, awash in the flood of their in-memory counterparts. At least that is one opinion floating about.  It is clear that other variables exist that will spoil the math when looking to pass through to the other side.  So here is another opinion: the physical database does still have a viable roll.  Why not keep your options open?

Sarah Rogers, March 1, 2011

Freebie

Data Mining Tactics: Palantir and Friends

February 21, 2011

Here in Harrod’s Creek, life is simple. We have one road, a store, and a pond. Elsewhere, there are machinations that simple folks like me and the goslings have difficulty understanding this type of pitch. I noticed an impassioned blog post from Craft Is Cranium here.  Then we saw the Register’s write up “HBGary Quails in the Face of Anonymous.” As I understand the issue, experts working in the commercial side of intelligence saw Wikileaks as a business opportunity. The experts did not want to sell their technology to Wikileaks. The experts wanted to get the US government to pay the experts to nibble away at Wikileaks. The assumption was that Wikileaks was a security challenge and could be sanded down or caged using various advanced technologies. A good example is the thread on Quora.com “Why Would Palantir Go after WikiLeaks?”

The Quora answers are interesting, and as you might imagine, different from what folks in Harrod’s Creek might suggest. First, there is a link to some interesting article titled “Data Intelligence Firms Proposed a Systematic Attack against WikiLeaks.” It is difficult to determine what is accurate and what is information shaping, but what is presented is interesting.

Second, one answer struck me as pure MBA. The proposal to nibble on Wikileaks’ toes was summarized this way:

For money. It’s a pitch deck targeted towards the concerns of governmental and financial institutions.

Third, there is a paraphrase of the specific motive for floating this trial balloon:

“You [the US government] have to respond to Wikileaks immediately, by giving us massive amounts of money for our software and consulting services. You cannot wait to write us a massive blank check, because the threat of Wikileaks is too great.”

What I find interesting is that the sharp edges of the Palantir-type approach may create some problems for search companies now venturing into “business intelligence.” My view is that enterprise search marketers are often crafted with memory foam and rounded edges. The Palantir type approach seems to be elbows and sharp fingernails.

Quite a few search vendors want to play in the “intelligence” sector. I am not sure that technology will win out over attitude and aggressiveness. Palantir, as you may recall, was engaged last year in a legal spat with i2 Ltd., another foundation company in certain intelligence sectors. Incumbents may eat the softer newcomers the way a goose gobbles bread crumbs.

Stephen E Arnold, February 21, 2011

Freebie

Exclusive Interview: Abe Music, Digital Reasoning

February 16, 2011

Digital Reasoning, based in Franklin, Tennessee, is one of a handful of companies breaking a path through the content jungle. The firm’s approach processes a wide range of “big data”. The system’s proprietary methods make it easy to discern trends, identify high-value items of data, and see the relationships among people, places, and things otherwise lost in the “noise” of digital information.

In addition to a number of high-profile customers in the defense and intelligence communities, the company is attracting interest from healthcare and financial institutions. Also, professionals engaged in eDiscovery, and practitioners in competitive intelligence are expressing interest in the company’s approach to “big data”. The idea of “big data” is large volumes of structured and unstructured content such as Twitter messages, Web logs, reports, email messages, blog data and system generated numerical outputs is increasingly important. The problem is that the content arrives continuously and in ever increasing volume.

Digital Reasoning has created a system and an interface that converts a nearly impossible reading task into reports, displays, and graphics that eliminate the drudgery and the normal process of looking at only a part of a very large collection of content. Their flagship product, Synthesys® essentially converts “big data” into the underlying facts, connections and associations making it possible to understand large scale data by examining facts instead of reading first.

I spoke with senior software engineer, Abe Music about Digital Reasoning’s approach and the firm’s activities in the open source community. Like some other next-generation analytics companies, Digital Reasoning makes use of open source software in order to reduce development time and introduce a standards-based approach into the firm’s innovative technology.

The full text of my interview with Abe Music appears below.

When did you first start following open source software?

I originally began learning about open-source software while in college. At Western Kentucky University we had a very prominent Linux users group that advocated open-source software wherever possible. This continued throughout my college career in any project that would allow it and after, where in my first job out of school, Python was the language of choice.

How does Digital Reasoning create a contribution to Open Source community through github?

Currently, PyStratus is the only contribution through github although more contributions are underway.

What is github?

Good question. github is a Web-based hosting service for open source software projects that use a revision control system. github offers both commercial plans and free accounts for open source projects, and it is a key community resource for the open source developers.

What is PyStratus?

Here at Digital Reasoning, we were using a set of Python scripts from Cloudera’s Hadoop distribution to manage our Hadoop clusters in the cloud.

Soon after, we had the need to easily manage our Cassandra clusters as well. We decided to leverage the work Cloudera had already done by converting the Cloudera Distribution of Hadoop or CDH scripts into an all-in-one solution for managing Hadoop, Cassandra and hybrid Hadoop/Cassandra clusters.

For us, we did a complete refactoring of the CDH scripts into an easily extensible Python framework for managing our services in the cloud.

What’s “refactoring”?

“Refactoring” to me is the process of changing a computer program’s source code without modifying its external functional behavior. Here at Digital Reasoning, when we refactor were are improving some of the attributes of the software such as performance or resource consumption, etc.

Thank you. Why are some firms supporting open source software?

I personally don’t see any downside to open-source software, but, of course, I am quite biased.

I can see, from the business side, a reason to stay closed if you had developed your business around some intellectual property that you wanted to control.

But I believe that open-source software really fills a void in the tech community because it allows anyone to take the software and extend it to fit their individual requirements without having to reinvent the wheel.

I also think it is important to use open-source software as a reference to learn some new technology or algorithm.

Personally I think that working with open source software is a great way to learn and I would recommend anyone writing code to consider using open source as a way to add to their personal coding knowledge base.

What are the advantages of tapping into the open source software trend that seems to be building?

One of the major advantages I see from using open-source software is that it makes possible taking some outstanding work from a community of developers. With open source software, I can put software to work immediately without much effort.

As a developer leveraging that technology — and not developing it yourself — you get the added benefit of very minimal maintenance on that piece of your software. If there is a bug, the community taps the collective pool of expertise. When someone adds to a project, everyone can take advantage of that innovation. The advantages of this approach range from greater reliability or a more rapid pace for innovation.

And I would definitely recommend giving back to the community wherever possible.

When you want to use open source software, what is your process for testing and determining what you can do with a particular library or component?

That’s a very good question. This is my favorite part actually.

Because there are so many great open-source technologies out there I get to play with all of them when considering which component(s) to use. I don’t have a particular process that I use to evaluate the software. I have a clear idea of what I need out of the component before I begin the evaluation. If there are similar components I will try to match each of them up to one another and determine which one fits my requirements the best.

Is this work or play? You seem quite enthusiastic about what strikes me as very complicated technical work?

To be candid, I find exploring, learning, and building enjoyable. I can’t speak for the other technologists at Digital Reasoning, but I find this type of problem-solving and analytical work both fun and rewarding. Maybe “play” is not the right word, but I like the challenge of this type of engineering.

Quite a few companies are supporting open source, including IBM. in your view will more companies be developing with open source in mind?

Yes, I definitely believe that more and more companies will begin supporting the open-source community simply because of the vast amount of benefits they can gain.

As a strategic move to support open-source a company could easily reduce development costs by “outsourcing” development to a particular piece of community-supported technology rather than developing it themselves.

The use of open source means that an organization not only get access to a piece of software that is not completely developed by them, but they also get to interface with some potential candidates for employment, contribute to fostering new ideas, and work within a community that is very passionate about what they are contributing to.

What next for Digital Reasoning and open source?

Our commitment to open source is strong. We have a number of ideas about projects. Look for further announcements in the future.

How can a person get more information about Digital Reasoning?

Our Web site is www.digitalreasoning.com. I know that you have interviewed our founder, Tim Estes, on two separate occasions, and there is a great deal of detailed information in those interviews as well. We have also recently announced Synthesys® Platform as a beta program allowing API access to our “big data” analytics with your data where we take complete responsibility for managing the cloud resources. More information about his new program can be found at http://dev.digitalreasoning.com.

Beyond Search Comment

A number of companies have embraced open source software. In an era of big data, Digital Reasoning has identified open source technology that helps cope with the challenges of peta-scale flows of structured and unstructured content. The firm’s new version of its flagship Synthesys service delivers blistering performance and easy-to-understand outputs in near-real time. Open source software has influenced Digital Reasoning and Digital Reasoning’s contribution to the open source community helps make useful technical innovations available to other developers.

Our view is that Digital Reasoning is taking a solid engineering approach to service its customers.

Stephen E Arnold, January 12, 2011

OpenText Opens Advanced Content Analytics Market

February 14, 2011

Following in the footsteps of other vendors, Open Text has opened an advanced content analytics market.

OpenText Licensing Agreement Brings Advanced Content Analytics to Market” reveals a tie up between OpenText and the National Research Council (Canada). The idea is that new Content Analytics innovations will be added to the ECM Suite and made available by spring 2011. The added content analytics to the ECM Suite will improve data mining and analysis. The key point is:

“Content analytics is the key to extracting business value from social media and text-rich online and enterprise information sources, an essential technology for marketing, online commerce, customer service, and improved search and Web experience. Given the mind-boggling growth in information volumes, no wonder uptake is booming, powered by rapid technical advances from leading-edge vendors such as OpenText.”

Content Analytics will perform data mining that will uncover and show relationships between businesses and other facts. It will be able to find information that a normal search engine wouldn’t find. This agreement is the beginning for OpenText to apply Content Analytics to all its enterprise content management Suite products.

Whitney Grace, February 14, 2011

Freebie

Textalyser Highlighted on Podcast

February 9, 2011

Text analysis was mentioned by the podcast No Agenda, which is hosted by Adam Curry (professional broadcast journalist) and John C. Dovorak (technical and business columnist). The No Agenda podcast team runs certain text through Textalyzer and uses the output to identify “memes”; that is, words or phrases designed to be magnetic and persist in a conversation.

You can give Textalyser, the tool No Agenda mentioned, by navigating to http://textalyser.net/. There are two Web accessible modes. First, you can take a chunk of text and paste it into the Analysis Box on the Web page. The system will generate a report. Shown below, is a portion of the Textalyser report for one of my 2010 for fee columns.

textalyser 01

The report generates a word frequency report, word length summary, and two, three, and four word phrase frequency reports.

The service carries this identification notice: V 1.05 help Traduction Nieruchomo?ci Magazine interactif Umarex Airsoft + Paintball. For more information about the service, you can navigate to this link and leave a message.

Stephen E Arnold, February 9, 2011

Freebie

Synthesys Platform Beta Available

February 7, 2011

Digital Reasoning alerted us last week that a new beta program for the Synthesys Platform is available. Digital Reasoning has emerged as one of “the leader in complex, large scale unstructured data analytics.” The Synthesys platform is one of the “leaders in complex, large scale unstructured data analytics.” We have interviewed the founder of Digital Reasoning in our Search Wizards Speak series. These interviews are available on ArnoldIT.com’s Search Wizards Speak series here and here. Digital Reasoning is one of the leaders in making next-generation analytics available via the cloud, on premises, and hybrid methods.

image

© Digital Reasoning, 2011

This platform version of Digital Reasoning’s software will provide beta users immediate API-level access to the firm’s analytics software and access to tools that will be added through the beta program.

Matthew Russell, vice president of engineering at Digital Reasoning said:

We are excited to introduce Synthesys Platform to the market. By allowing users to upload their data into the cloud for analysis, many more users will get the opportunity to experience next generation data analytics while exploring their own data.

Digital Reasoning Systems (www.digitalreasoning.com) solves the problem of information overload by providing the tools people need to understand relationships between entities in vast amounts of unstructured and structured data.

Digital Reasoning builds data analytic solutions based on a distinctive mathematical approach to understanding natural language. The value of Digital Reasoning is not only the ability to leverage an organization’s existing knowledge base, but also to reveal critical hidden information and relationships that may not have been apparent during manual or other automated analytic efforts. Synthesys is a registered trademark of Digital Reasoning Systems, Inc.

Digital Reasoning will be exhibiting at the upcoming Strata Conference on February 28 and March 1, 2011. For more information about Digital Reasoning, navigate to the company’s Web site at www.digitalreasoning.com.

Stephen E Arnold, February 7, 2011

Juru, Watson, I Say, Juru!

February 1, 2011

Quite a heated discussion at lunch today. One of the goslings was raving about Watson. The Jeopardy demo convinced the engineer that IBM had the next big thing in search. A person can ask a question and right away get the answer. Wow. I thought that type of computer system only worked under carefully controlled conditions, in demos, or in motion pictures.

That’s why the goslings were agitated when I said, “It is TV. TV does almost anything—well, anything—for money.” I pointed out that the game shows 21 and the $64,000 Question took some liberties to boost ratings. Have TV times changed that much? I said, “I don’t think so.”

I supported my argument by mentioning Juru. Do you remember that gem from IBM. Here’s what my Overflight system spit out.

Juru is / was a full text search “library” that would make short work of “small and mid-sized corpuses.” Of course, “small” and “mid-sized” are rarely defined either by IBM or other search researchers. The idea was that Java made it easy to run Juru on any platform. Of course, today, I don’t think Juru would work in the Android or IOS environment, but some day maybe.

Juru asserted that the system would:

  • Support different document types
  • Make use of links just like our every tweakable PageRank-type systems
  • On the fly summaries of documents
  • Clustering
  • Nifty ways to keep the indexes small and, therefore, zippy.

You can get some info at this link. There is some additional color here:

I reminded the goslings that IBM rolls out search solutions as part of its global marketing efforts. More to the point, I asked the goslings which vendors’ search systems IBM resells. I did not hear the magic words Autonomy or Endeca. IBM once loved Fast ESP.

If you want search from IBM, what do you get today? A version of the open source search solution Lucene. Why? It works pretty well. Juru, Watson, Web Fountain, et al? Well, make up your own mind with some head to head testing. I won the argument and still had to pay for lunch. Honk.

Stephen E Arnold, February 1, 2011

Freebie

SharePoint Sharing from Attivio

January 19, 2011

Of interest to businesses overwhelmed with voluminous SharePoint content: “Attivio Announces AIE for SharePoint Integration” tells of the Active Intelligence Engine’s new availability to aggregate information across not only SharePoint, but also websites, databases, email, CRM and other information sources.

According to the announcement,

“The difficulty of discovering and delivering timely insight derived from all of these resources exposes gaps in an organization’s ability to integrate and rapidly update information; providing a single method for users to find the information they need, regardless of its origins.  AIE for SharePoint Integration enables secure access to all types of information by unifying diverse datasets, while avoiding the cost and delays of cumbersome legacy integration stages. “

With AIE companies no longer have to worry about SharePoint silos, easily accessible only within departments and can instead maximize insight and collaboration across the entire enterprise.  Attivio promises retention of data relationships from text sources, rapid implementation, and tight security to result in maximized competitive advantage. We believe the company has a good approach to a very tough SharePoint challenge.

Alice Wasielewski, January 19, 2011

Google and Local Search Commitment

January 17, 2011

Google re-loads and takes aim, this time at Facebook.  “Google’s Mobile Matchmaker” reports on an interview with Marissa Mayer, Google’s executive in charge of “local” products.  For Google “local” includes, maps, mobile, and even social activities.  “Contextual discovery,” giving automatic location-based information, or “search without search” as she calls it, is the basis of the way Mayer seeks to knock Facebook off its pedestal.  Google is working on taking the location information and adding social contextual information, such as showing a person in a restaurant the menu with annotations from friends or regular customers of the venue.  When asked if Google might work with Facebook on some of these social applications, Mayer demurred, citing Facebook’s closed nature versus Google’s support of the open web.  Instead, Mayer pointed to the Google social-esque alternatives such as Google Latitude, an application that follows the physical location of someone on a map.  The social implications seem obvious: “Once you tell Google who your friends are on Latitude, that same information might eventually be used for other services like socially marked-up menus, if you permitted it. The point is that Google may have more ways to acquire social information than just by building its own competing social network.”  My view is that Facebook may have reached its peak and is ripe for a serious alternative  The idea of friends’ LoJacking on Latitude doesn’t appeal to me, but then I’m not a FourSquare fan either.  Facebook, watch out, Google’s war is underway.

Alice Wasielewski, January 17, 2011

Freebie

2007 Semantic Search Info Still Relevant

January 13, 2011

Short honk. We had a long call today (January 12, 2011) about semantic search. In the course of the call, I mentioned a presentation by Jon Atle Gulla, a profession in 2007 at the Norwegian University of Science and Technology. I did some poking around and found the link to the presentation. Quite useful in 2007 and still germane today. The presentation puts into context some of the work that must be done to deploy an effective semantic technology system in an organization. The slide deck is on Slideshare at this link. Registration may be required to access the file.

Stephen E Arnold, January 13, 2011

Freebie

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta