Search and Virtualization

March 1, 2011

Quick. What enterprise search vendors’ systems permit virtualization? The answer is that the marketing professional from any search firm will say, “We do.” However, the technology professional who rarely speaks to customers will say, “Well, that is an interesting question.”

Virtualization is turning big honking servers into lots of individual machines or servers. Virtualization is easy to talk about as search vendors tout their systems’ capabilities as business intelligence services. But in our experience remains both science and art. Another way to describe virtualization and search is “research project.”

Our contributing writer Sarah Rogers reports:

The commercial climate for virtualization is changing.  Business intelligence (BI) represents just one force exerting its influence.  As the needs of numerous businesses reach levels where accessing, housing and reviewing information are yesterday’s problems, the new focus becomes how to maximize efficiency without renting secondary office space to handle the servers required.  Many are turning to virtualization.

But virtualization isn’t all perks, as examined in “Are SQL Server BI systems compatible with virtualization?”.  Systems operating under the BI umbrella will not always function at full capacity when connected to an incorporeal network.  Contemporary BI groups construct detail heavy examination patterns inside existing memory as you need it.  These analytical systems often are designed to retain vast amounts of data, which when operating through a virtualized platform can breed obstacles in the path to access. Another issue is what is described as over commitment, where hosts ration out available memory to all those connected.  A fine idea, though again analytical systems may overload the designated operating pattern and diminish results.

Though traditional databases are suited to disambiguate these compatibility issues, they seem to be struggling, awash in the flood of their in-memory counterparts. At least that is one opinion floating about.  It is clear that other variables exist that will spoil the math when looking to pass through to the other side.  So here is another opinion: the physical database does still have a viable roll.  Why not keep your options open?

Sarah Rogers, March 1, 2011

Freebie

Top Web Search Engines Analyzed

February 22, 2011

As the pool of internet search engines grows, so does the need to know which is best. “Best” can be based upon a number of factors: recall, precision, speed, etc. The people at AnalyzeThis.ru have developed a tool for comparing the top search engines.

They state,

“In order to independently assess the search quality, we developed a set of analyzers, one for each type of search queries . . . We measure the quality of navigational and informational search, the percentage of pornography among the pages found by a search engine, etc.”

Google is predictably the overall winner. Separate results are provided for each individual analyzer measured. This tells us that people are searching for a way to independently analyze the quality of search engines; however, the endeavor is subjective and objectivity is hard to produce. A Russian firm is behind the research and although the webpage is well translated, it would be useful to know in which language the research was conducted as it may have an impact on results.

Emily Rae Aldridge, February 22, 2011

Compusearch Launches PRISM Business Intelligence Dashboard

February 18, 2011

Why search? Look at the dashboard.

Compusearch Puts Mission- Critical Information at Agency Fingertips” at redOrbit announces the release of Compusearch’s PRISM Business Intelligence Dashboard. Users can now access information on key performance indicators from within the PRISM software.

This is not, however, a simple point and click search. For confirmation, just take a look at the Dashboard. As the article explains:

“This add-on module to PRISM provides the power to support a wide range of custom report style widgets with drill-down and drill-through capability, as well as robust data visualization features that can be animated and interactive. The PRISM BI Dashboard is based on an open architecture and utilizes XML and web services, which allows data and information from across agency enterprises to be easily monitored, analyzed and reported.”

Compusearch focuses on software and systems integration, mostly for government agencies. The hitch may be that if you look at the dashboard when you drive, you may run over a pedestrian. Is this a risk when performing business intelligence and analysis?

Cynthia Murrell February 18, 2011

Freebie

Exclusive Interview: Abe Music, Digital Reasoning

February 16, 2011

Digital Reasoning, based in Franklin, Tennessee, is one of a handful of companies breaking a path through the content jungle. The firm’s approach processes a wide range of “big data”. The system’s proprietary methods make it easy to discern trends, identify high-value items of data, and see the relationships among people, places, and things otherwise lost in the “noise” of digital information.

In addition to a number of high-profile customers in the defense and intelligence communities, the company is attracting interest from healthcare and financial institutions. Also, professionals engaged in eDiscovery, and practitioners in competitive intelligence are expressing interest in the company’s approach to “big data”. The idea of “big data” is large volumes of structured and unstructured content such as Twitter messages, Web logs, reports, email messages, blog data and system generated numerical outputs is increasingly important. The problem is that the content arrives continuously and in ever increasing volume.

Digital Reasoning has created a system and an interface that converts a nearly impossible reading task into reports, displays, and graphics that eliminate the drudgery and the normal process of looking at only a part of a very large collection of content. Their flagship product, Synthesys® essentially converts “big data” into the underlying facts, connections and associations making it possible to understand large scale data by examining facts instead of reading first.

I spoke with senior software engineer, Abe Music about Digital Reasoning’s approach and the firm’s activities in the open source community. Like some other next-generation analytics companies, Digital Reasoning makes use of open source software in order to reduce development time and introduce a standards-based approach into the firm’s innovative technology.

The full text of my interview with Abe Music appears below.

When did you first start following open source software?

I originally began learning about open-source software while in college. At Western Kentucky University we had a very prominent Linux users group that advocated open-source software wherever possible. This continued throughout my college career in any project that would allow it and after, where in my first job out of school, Python was the language of choice.

How does Digital Reasoning create a contribution to Open Source community through github?

Currently, PyStratus is the only contribution through github although more contributions are underway.

What is github?

Good question. github is a Web-based hosting service for open source software projects that use a revision control system. github offers both commercial plans and free accounts for open source projects, and it is a key community resource for the open source developers.

What is PyStratus?

Here at Digital Reasoning, we were using a set of Python scripts from Cloudera’s Hadoop distribution to manage our Hadoop clusters in the cloud.

Soon after, we had the need to easily manage our Cassandra clusters as well. We decided to leverage the work Cloudera had already done by converting the Cloudera Distribution of Hadoop or CDH scripts into an all-in-one solution for managing Hadoop, Cassandra and hybrid Hadoop/Cassandra clusters.

For us, we did a complete refactoring of the CDH scripts into an easily extensible Python framework for managing our services in the cloud.

What’s “refactoring”?

“Refactoring” to me is the process of changing a computer program’s source code without modifying its external functional behavior. Here at Digital Reasoning, when we refactor were are improving some of the attributes of the software such as performance or resource consumption, etc.

Thank you. Why are some firms supporting open source software?

I personally don’t see any downside to open-source software, but, of course, I am quite biased.

I can see, from the business side, a reason to stay closed if you had developed your business around some intellectual property that you wanted to control.

But I believe that open-source software really fills a void in the tech community because it allows anyone to take the software and extend it to fit their individual requirements without having to reinvent the wheel.

I also think it is important to use open-source software as a reference to learn some new technology or algorithm.

Personally I think that working with open source software is a great way to learn and I would recommend anyone writing code to consider using open source as a way to add to their personal coding knowledge base.

What are the advantages of tapping into the open source software trend that seems to be building?

One of the major advantages I see from using open-source software is that it makes possible taking some outstanding work from a community of developers. With open source software, I can put software to work immediately without much effort.

As a developer leveraging that technology — and not developing it yourself — you get the added benefit of very minimal maintenance on that piece of your software. If there is a bug, the community taps the collective pool of expertise. When someone adds to a project, everyone can take advantage of that innovation. The advantages of this approach range from greater reliability or a more rapid pace for innovation.

And I would definitely recommend giving back to the community wherever possible.

When you want to use open source software, what is your process for testing and determining what you can do with a particular library or component?

That’s a very good question. This is my favorite part actually.

Because there are so many great open-source technologies out there I get to play with all of them when considering which component(s) to use. I don’t have a particular process that I use to evaluate the software. I have a clear idea of what I need out of the component before I begin the evaluation. If there are similar components I will try to match each of them up to one another and determine which one fits my requirements the best.

Is this work or play? You seem quite enthusiastic about what strikes me as very complicated technical work?

To be candid, I find exploring, learning, and building enjoyable. I can’t speak for the other technologists at Digital Reasoning, but I find this type of problem-solving and analytical work both fun and rewarding. Maybe “play” is not the right word, but I like the challenge of this type of engineering.

Quite a few companies are supporting open source, including IBM. in your view will more companies be developing with open source in mind?

Yes, I definitely believe that more and more companies will begin supporting the open-source community simply because of the vast amount of benefits they can gain.

As a strategic move to support open-source a company could easily reduce development costs by “outsourcing” development to a particular piece of community-supported technology rather than developing it themselves.

The use of open source means that an organization not only get access to a piece of software that is not completely developed by them, but they also get to interface with some potential candidates for employment, contribute to fostering new ideas, and work within a community that is very passionate about what they are contributing to.

What next for Digital Reasoning and open source?

Our commitment to open source is strong. We have a number of ideas about projects. Look for further announcements in the future.

How can a person get more information about Digital Reasoning?

Our Web site is www.digitalreasoning.com. I know that you have interviewed our founder, Tim Estes, on two separate occasions, and there is a great deal of detailed information in those interviews as well. We have also recently announced Synthesys® Platform as a beta program allowing API access to our “big data” analytics with your data where we take complete responsibility for managing the cloud resources. More information about his new program can be found at http://dev.digitalreasoning.com.

Beyond Search Comment

A number of companies have embraced open source software. In an era of big data, Digital Reasoning has identified open source technology that helps cope with the challenges of peta-scale flows of structured and unstructured content. The firm’s new version of its flagship Synthesys service delivers blistering performance and easy-to-understand outputs in near-real time. Open source software has influenced Digital Reasoning and Digital Reasoning’s contribution to the open source community helps make useful technical innovations available to other developers.

Our view is that Digital Reasoning is taking a solid engineering approach to service its customers.

Stephen E Arnold, January 12, 2011

Synthesys Platform Beta Available

February 7, 2011

Digital Reasoning alerted us last week that a new beta program for the Synthesys Platform is available. Digital Reasoning has emerged as one of “the leader in complex, large scale unstructured data analytics.” The Synthesys platform is one of the “leaders in complex, large scale unstructured data analytics.” We have interviewed the founder of Digital Reasoning in our Search Wizards Speak series. These interviews are available on ArnoldIT.com’s Search Wizards Speak series here and here. Digital Reasoning is one of the leaders in making next-generation analytics available via the cloud, on premises, and hybrid methods.

image

© Digital Reasoning, 2011

This platform version of Digital Reasoning’s software will provide beta users immediate API-level access to the firm’s analytics software and access to tools that will be added through the beta program.

Matthew Russell, vice president of engineering at Digital Reasoning said:

We are excited to introduce Synthesys Platform to the market. By allowing users to upload their data into the cloud for analysis, many more users will get the opportunity to experience next generation data analytics while exploring their own data.

Digital Reasoning Systems (www.digitalreasoning.com) solves the problem of information overload by providing the tools people need to understand relationships between entities in vast amounts of unstructured and structured data.

Digital Reasoning builds data analytic solutions based on a distinctive mathematical approach to understanding natural language. The value of Digital Reasoning is not only the ability to leverage an organization’s existing knowledge base, but also to reveal critical hidden information and relationships that may not have been apparent during manual or other automated analytic efforts. Synthesys is a registered trademark of Digital Reasoning Systems, Inc.

Digital Reasoning will be exhibiting at the upcoming Strata Conference on February 28 and March 1, 2011. For more information about Digital Reasoning, navigate to the company’s Web site at www.digitalreasoning.com.

Stephen E Arnold, February 7, 2011

US Census Counts with Endeca

February 4, 2011

Endeca as hit the metaphorical nail on the head with one of its latest endeavors. The US Census Bureau is now using Endeca Technologies business intelligence software, Endeca Latitude, to launch its new American FactFinder. We learned in “US Census Bureau launches New American FactFinder on Endeca”:

American FactFinder makes more than 250 billion decennial census facts available and navigable to the average American, civil servants and skilled statisticians alike.”

After a preliminary rollout with American FactFinder, The Bureau challenged itself to redo the site to provide easier access to no professional or expert users. Endeca allows users to search within specified taxonomy to receive relevant search results as well as how to access the results.

I had the chance to play with American FactFinder and I must say, go, Census! Get that 2010 data into the system. Lots of changes since Year 2000. Point-and-click, canned PDF reports, and, oh, 2010, data. Did I mentioned that? Year 2000 data.

Leslie Radcliff, February 4, 2011

Freebie

New Boss, Slow MSFT Loss?

February 2, 2011

Microsoft is losing money in some market segments. See Business Insider’s “Chart of the Day: Microsoft Incinerates Another $543 Million Online” Here you’ll find a striking chart which shows that Microsoft’s online ventures (search, pretty much) have been bombing financially since the summer of ’06. That kind of dough would buy a lot of pals here in Harrod’s Creek.

Computer World’s “Microsoft’s Windows revenues plunge 30%.” illustrates that the company can’t rest on it’s Windows sales, either; Windows revenues in the last quarter of 2010 were disappointing. Depending on how you look at the numbers, receipts are either down 30% or up 3%. Either way, we’re sure Microsoft was hoping for better.

Will a change in executives save the online branch? “Microsoft to go inside for its new head of Online Services” at ZDNet reports the selection of Lee Nackman to replace retiring Corporate Vice President of Microsoft Online Services, Dave Thompson. Nackman will be shedding his current title of Corporate Vice President, Directory, Access, and Information Protection. About the company’s plans, ZDNet’s Mary-Jo Foley writes:

“Microsoft is creating a common platform across its individual and packaged Online services. The goal is to make Office 365 and its component parts . . . as well as new Microsoft Online services . . . based on a common billing, provisioning and commerce platform. A common dashboard will allow users to manage any/all of the Microsoft Online services.”

Sounds good. We’ll see whether Mr. Nackman is up to the task. We hope he pays attention to the SharePoint search situation too.

Cynthia Murrell February 2, 2011

Freebie

Libraries and Business Intelligence

February 2, 2011

Years ago—maybe 1984—Carol Galvin and I started a dead tree newsletter called “Marketing Library Services” or MLS. Clever, eh? I don’t recall our covering the subject of libraries and business intelligence. Imagine my delighted when I saw Mary Hayes Weier’s “New York Library Looks To BI Software For Help.” Please, read the story here. The main idea was:

…knowing at what times people are using computers helps make decisions about the best hours for staffing personnel with computer skills. “It’s allowing us to view the different relationships between print and non-print materials, and to ask the right questions,” Gillinson [NY library executive] said.

Will business intelligence become a key component in libraries’ cost control and patron services? I think it will become more important. In my experience, complexity can undermine the utility of some important tools. Libraries face financial headwinds. I am not sure business intelligence can streamline some library systems because what’s needed is money.

Stephen Arnold, February 2, 2011

This post went up in March 2009 but an incorrect tag prevented it from displaying. The goslings and I honk, “Sorry.”

Collecta to Reconfigure

February 2, 2011

Collecta has totally changed directions, says Mashable.com in the article, “Startup Collecta Shuts Down Its Product, Working on a New One.” The real-time search engine that launched in 2009 has been closed and the company has decided to concentrate their efforts on new ideas. Most of Collecta’s money is still in the bank and their retaining many of their original employees.

Gerry Campbell, the CEO, states that while running the real-time search engine they learned three lessons: there’s a huge need for real-time information, a destination site is not the way to reach people, and new trends, i.e. Facebook, are growing.

“It’s interesting to note that Collecta’s major rival in the real-time search space, OneRiot, also completely changed product directions this year, dumping its search engine and moving into the online ad game.”

Everyone appears to be switching to the real-time market. While this is where the money appears to be going at the moment, I wonder who will foot the bill?  Is this a pivot point for social content search engines? Could be.

Whitney Grace, February 2, 2011

Freebie

Business Intelligence Revolution. Really?

January 30, 2011

Gartner Predicts Business Intelligence Revolution” reports on the four key trends that Gartner predicts for BI.  One: a 33% rise in BI tools on handheld devices by 2013 / shift to dedicated mobile analytic applications by 2012.  Two: 40% of BI spending on systems integrators by 2014.  Three: 15% of all BI tool deployments containing social software/collaboration elements in their decision-making environments by 2013.  Four: 30% of analytic applications becoming more proactive and predictive in their forecasts and using in-memory functionality to increase scale and computational speeds by 2014.  Not exactly earth-shaking, and others are commenting that these numbers are too conservative.  “Roger Llewellyn, chief executive of BI and analytics firm Kognitio, argued that Gartner’s estimates are too cautious, and that the rise in the use of BI within other areas of businesses will drive the use of new systems.”  Llewellyn says the 30% of analytic applications using in-memory functions figure should be more like a “surge.”  As most of these developments have been around for a while, I would tend to agree.  Same old wine, new blue chipped bottle. We have heard this before. In fact, one big dog search firm won’t use the phrase “business intelligence”. Good idea that. I don’t know what business intelligence is. Do you?

Alice Wasielewski, January 30, 2011

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta