Statistics for the Statistically Inclined

June 10, 2011

Due to a strong bias against everyone’s favorite search engine, it is difficult for me to become excited over new Google developments.  However, having endured a number of statistics classes, I will certainly give credit where credit is due.

I was recently directed to Google Correlate and spent a solid twenty-five minutes entertaining myself with test statistical relationships.  The offering consists of comparisons of an uploaded data set against a real data set courtesy of the search mogul.  Google provides results based on a Pearson Correlation Coefficient (r) nearest to 1.0, giving the user the most positively correlated queries.  One can customize the results in a number of manners: for negative relationships, against a time series or regional location, for a normalized sine function or a scatter plot, etc.

For any glazed over eyes out there, the Web site sums up the intent this way:

“Google Correlate is like Google Trends in reverse. With Google Trends, you type in a query and get back a series of its frequency (over time, or in each US state). With Google Correlate, you enter a data series (the target) and get back queries whose frequency follows a similar pattern.”

Don’t worry, there is a tutorial.

It should also be noted that this service is tagged as “experimental”.  I fear due to lack of popularity, it may dissolve in its very own time series in sad, monthly increments.

I imagine this tool is providing certain students some relief, but what of regular users?  In the words of the head gander, how many Google mobile users know what correlate means?  Without crunching the data, I think our r may be approaching -1.0.

Sarah Rogers, June 10, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

ProQuest: A Typo or Marketing?

June 10, 2011

I was poking around with the bound phrase “deep indexing.” I had a briefing from a start up called Correlation Concepts. The conversation focused on the firm’s method of figuring out relationships among concepts within text documents. If you want to know more about Correlation Concepts, you can get more information from the firm’s Web site at http://goo.gl/gnBz6.

I mentioned to Correlation Concepts Dr. Zbigniew Michalewicz’s work in mereology and genetic algorithms and also referenced the deep extraction methods developed by Dr. David Bean at Attensity. I also commented on some of the methods disclosed in Google’s open source content. But Google has become less interesting to me as new approaches have become known to me. Deep extraction requires focus, and I find it difficult to reconcile focus with the paint gun approach Google is now taking in disciplines far removed from my narrow area of interest.

image

A typo is a typo. An intentional mistake may be a joke or maybe disinformation. Source: http://thiiran-muru-arul.blogspot.com/2010/11/dealing-with-mistakes.html

After the interesting demo given to me by Correlation Concepts, I did some patent surfing. I use a number of tools to find, crunch, and figure out which crazily worded filing relates to other, equally crazily worded documents. I don’t think the patent system is much more than an exotic work of fiction and fancy similar to Spenser’s The Faerie Queene.

Deep indexing is important. Key word indexing does not capture in some cases the “aboutness” of a document. As metadata becomes more important, indexing outfits have to cut costs. Human indexers are like tall grass in an upscale subdivision. Someone is going to trim that surplus. In indexing, humans get pushed out for fancy automated systems. Initially more expensive than humans, the automated systems don’t require retirement, health care, or much management. The problem is that humans still index certain content better than automated systems. Toss out high quality indexing and insert algorithmic methods, and you get search results which can vary from indexing update to indexing update.

Read more

Digital Reasoning Adds Chinese Support to Synthesys

June 9, 2011

Digital Reasoning Introduces Chinese Language Support for Big Data Analytics,” announces the company’s press release. This latest advance from the natural language wizards acknowledges the growing prevalence of Chinese on the Web. The support augments their premiere product, Synthesys:

“Synthesys can now analyze the unstructured data from a variety of sources in both English and Chinese to uncover potential threats, fraud, and political unrest. By automating this process, intelligence analysts can gain actionable intelligence in context quickly and without translation.”

This key development is the sort of thing that makes us view Digital Reasoning as a break out company in content processing. Their math-based approach to natural language analytics puts them ahead of the curve in this increasingly important field. Synthesis has become an essential tool for government agencies and businesses alike.

This support for Chinese is just the beginning. Rob Metcalf, President and COO, knows that “the next generation of Big Data solutions for unstructured data will need to natively support the world’s most widely spoken languages.”

We’re delighted to see Digital Reasoning continue to excel.

Cynthia Murrell June 8, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

SAS Simplifies Text Analysis

June 8, 2011

Let’s face it, time is money. (Some former SEO Panda victims, assorted art history majors, and a few MBAs perceive time as opportunity to contemplate the magnitude of their student loans and monthly cash flows.)

Wading through archives to find the answers to your questions is labor intensive and more work than watching reruns on TV.

We found “New SAS Industry Taxonomy Rules Starter Kits Enhance the Speed to Value of Text Analytics” promises to be the answer to some of these problems. It can cut search time from months to weeks by creating a structured taxonomy. The story asserted:

Building taxonomies from scratch can be daunting. But with the new SAS Industry Taxonomy Rules starter kits, organizations get a jump start, and can move more quickly from document and text chaos to value and insight from their unstructured data.

However, the use of taxonomy isn’t anything new. Many businesses recognize the value of categorizing their archives in order to save time and in the end money.

The same thing goes for digital archives of electronic documents, SAS helps to organize the electronic and save valuable time and money by using the most effective integrated capabilities on the market today both in its ability to combine structured and unstructured data. It also utilizes predictive analytics to remember what documents or areas are searched most as well as allowing customers to customize their systems to fit individual needs.

Sounds like a win win.

Leslie Radcliff, June 8, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

Exalead Makes a Sage Move

June 1, 2011

We have no qualms over recurrent expressions of our appreciation and enthusiasm for the Exalead brand.

A long time leader in the field of search enabled applications and data management software, the company continues to prove itself relevant in a landscape that shifts more frequently than the iTunes’ Recent Hits page.

The most recent news we saw about Exalead, a unit of Dassault Systèmes, comes in the form of a deal with the Sage Group. Sage is one of the leaders in enterprise resource planning (ERP). Sage will use Exalead’s technology in the Sage ERP X3 system.

The write up “Sage Innovates with Exalead CloudView to Enhance Its ERP User Experience” said:

CloudView brings the speed and simplicity of consumer Web search to the Sage ERP X3 user experience, offering flexible natural language search across all Sage database content, including both data and metadata. Offered as a simple drag-and-drop Gadget in the Sage portal, CloudView-powered Sage Search enables users to locate information anywhere in the system using a single text box: no training, complex forms or SQL queries required. Moreover, fuzzy matching and flexible search refinement by dynamic results categories help ensure search success even when a user’s query is incomplete, misspelled or imprecise.

CloudView may give Sage a turbo boost. With this deal, Sage and Exalead jump up the enterprise charts to super group status.

Micheal Cory, June 1, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

EMC: Lots of Initiatives and Now an Appliance

June 1, 2011

EMC has been busy. The company has announced a wide range of initiatives. The flow of announcements has been overwhelming. We did notice “SAS Will Be Available On A Database Appliance From EMC,” SAS has announced that it will begin to offer SAS High Performance Analytics. The system will be available on an EMC database appliance.

The blog asserted:

This new offering from SAS on the EMC Greenplum Data Computing Appliance will provide an environment for customers to perform analytical exploration and development on all data to complement their regular analytic operations.

Clients will be able to form models that take into account their data from each department and showcase all the possible scenarios. Being able to see the whole picture definitely gives customers a more accurate picture to enable to them to make better decisions. In addition when compared to current technology, SAS High-performance Analytics blows the competition out the water and solves problems in seconds rather than hours. This appliance could be in the running for best in class.

However, with appliances proliferating in some organizations, management of yet another toaster is, in our experience, beginning to generate some pushback.

April Holmes, June 1, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

The Analytics Path: Search Sits at the Kerb

May 30, 2011

According to the Technology Review article “The Future of Analytics” IBM is working on the next generation of Analytics technology and has set out of develop technology that can handle the massive amounts of data. The team led by Chid Apte:

“is developing algorithms and other techniques that can extract meaning from data, and it is trying to find ways to use these methods to solve business challenges.”

In his interview with Tom Simonite, Apte indicated that the company was trying to take company data as well as social information data and work with clients to see how both sources can be used to handle business problems. The team even helped to develop the popular QA technology that was used on the Watson on Jeopardy and they hope to bridge this QA problem solving technology into their system.

Apte concluded by emphasizing the ever present need for a better way to handle large scale data. If IBM can pull it off they will have hit the jackpot.

IBM has a Tundra truck stuffed with business intelligence, statistics, and analytics tools. IBM has no product. IBM, in my view, has an opportunity to charge big bucks to assemble these components into a system that makes customers wheeze, “No one ever got fired for buying IBM.”

Well, it used to be true. And it is probably true for MIT grads. Today? Maybe. Tomorrow? Maybe not.

April Holmes, May 30, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

Consultant Benchmarks Business Intelligence

May 27, 2011

Business intelligence has seen tremendous growth and with so many different companies on the market all vying for clients it can become difficult for business owners to know exactly which one will adequately fit their needs.

We learned that InetSoft is a sponsor of the Aberdeen’s Group Agile BI Benchmark Study, which provides a detailed survey and analysis of how companies are currently using their business intelligence products and how they improve.

We found the notion of agile business intelligence interesting. Traditionally business intelligence required trained specialists and programmers with the ability to convert an end user’s dreams into the cold, hard reality of a report. Today end users want to do their own report building and data analysis. In our experience, this sounds great in a pitch focused on reducing headcount. However, in some situations, flawed data leads to even more suspect business decisions.

We learned from the announcement about the study that:

Agile BI is business intelligence that can rapidly adapt to meet changing business needs.

Okay.

Many of those surveyed admitted they were not delivering their business intelligence products on time and found it difficult to make timely decisions. Those companies that earned “Best In Class” were those that were able to provide fully interactive BI to their users. The write up asserted:

Managers need to get “hands-on” to interact with and manipulate data if they are to meet the shrinking timeframe for business decisions that they face.

Without building a solid foundation and taking control BI cannot be fully effective and is like a bird with no wings.

You can obtain a free complimentary copy of the report please visit http://goo.gl/3WujV. We have no idea how long the free report will be available. Act quickly.

Stephen E Arnold, May 27, 2011

Freebie

More from IBM Watson: More PR That Is

May 19, 2011

IBM keeps flogging Watson, which seems to be Lucene wrapped with IBM goodness. We have reported on the apparent shift in search strategy at IBM; to wit, search now embraces content analytics. Many vendors are trying to spit shine worn toe cap oxfords in an effort to make search into a money machine. Good luck with that.

Network World tells us that “Watson Teaches ‘Big Analytics.’” Ah, more Watson hyperbole.

Skillful big analytics is necessary to make use of big data, of course, and in most cases speed is also a factor. Watson demonstrated proficiency at both with its Jeopardy win. Now, IBM hopes to use those abilities in enterprise products. As well they should; the need for such tools is expanding rapidly.

“Businesses successfully utilizing big analytics can take this process of knowledge discovery even further, identifying questions, exploring the answers and asking new questions based on those answers. This iterative quality of data analysis, rather than incremental exploration, can lead to a deeper understanding of business and markets, and begin to answer questions never before considered.”

Yep, we think we get it: Big data and a robust big analytic product are increasingly necessary to stay competitive. What we want to know, though, is this: when is all this going to change Web or Internet search? When will the Watson product be “a product”? Enough PR. That’s easy. How about a useful service we can test and compare to other systems?

Cynthia Murrell May 19, 2011

Freebie

New Landscape of Enterprise Search Details Available

May 18, 2011

Stephen E Arnold’s new report about enterprise search will be shipping in two weeks. The New Landscape of Enterprise Search: A Critical Review of the Market and Search Systems provides a fresh perspective on a fascinating enterprise application.

The centerpiece of the report are new analyses of search and retrieval systems offered by:

Unlike the “pay to play” analyses from industry consultant and self-appointed “experts,” Mr. Arnold’s approach is based on his work in developing search systems and researching search systems to support certain inquiries into systems’ performance and features.

, to focus on the broad changes which have roiled the enterprise search and content processing market. Unlike his first “encyclopedia” of search systems and his study of value added indexing systems, this new report takes an unvarnished look at the business and financial factors that make enterprise search a challenge. Then he uses a historical base to analyze the upsides and downsides of six vendors’ search solutions. He puts the firm’s particular technical characteristics in sharp relief. A reader gains a richer understanding of what makes a particular vendor’s system best suited for specific information access applications.

Other features of the report include:

  • Diagrams of system architecture and screen shots of exemplary implementations
  • Lists of resellers and partners of the profiled vendors
  • A comprehensive glossary which attempts to cut through the jargon and marketing baloney which impedes communication about search and retrieval
  • A ready-reference table for more than 20 vendors’ enterprise search solutions
  • An “outlook” section which offers candid observations about the attrition and financial health of the hundreds of companies offering search solutions.

More information about the report is available at http://goo.gl/0vSql. You may reserve your copy by writing seaky2000 @ yahoo dot com. Full ordering information and pricing will be available in the near future.

Donald C Anderson, May 18, 2011

Post paid for by Stephen E Arnold

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta