SAS Juices Up Text Mining

December 20, 2010

SAS has updated their Predictive Analytics & Data Mining page, and of particular note is their updated version of SAS Text Miner, which can be used to grasp trends from unstructured text without the user having to be familiar with the contents.

Text Miner “provides complete views and meaningful insights within an integrated predictive modeling environment. Automating manual comprehension of the textual data sources, incorporating interactive drill-down reporting, and delivering algorithms for rigorous advanced analyses make it possible to grasp future trends and act on new opportunities more efficiently and with less risk.“  The 4.2 version includes not only a high-performance search capability, but also enhanced spell-check and the processing of multiple topics for each document and includes new text parsing, topic, and filter nodes.

The difference of SAS Text Miner versus any other text mining solution is that SAS has the best data mining algorithms and the simplest interface for managing and importing data, and SAS integrates its text mining capabilities into its data mining solution better than anyone else.

Alice Wasielewski, December 20, 2010

Gepi: The Open Source Graphing Tool

December 20, 2010

A New Year’s Day treat!

Want to create open source graphs with lots of pretty colors? The O’Reilly Radar recommends “Strata Gems: Explore and Visualize Graphs with Gephi.” This program allows you to turn any form of data into a graph. Gepi is an open source project great to analyze networks and data. It can be used on all the major operating systems is described as a “Photoshop for data.”

“Graphs can be loaded and created using many common graph file formats, and explored interactively. Hierarchical graphs such as social networks can be clustered in order to extract meaning. Gephi’s layout algorithms automatically give shape to a graph to help exploration, and you can tinker with the colors and layout parameters to improve communication and appearance.”

Another great feature Gephi offers is that it is extensible through plugins. These will allow you to export and publish the data on the web and experiment with other layouts. Gephi appears to be a quick and easy way to study data, plus the color options will keep your artistic side happy. Get Gephi at http://gephi.org/.

Whitney Grace, December 20, 2010

Freebie

Arnold Comments about Exalead

December 20, 2010

A couple of times a year, I make a swing through Europe. I visit vendors, get demos, and talk with engineers about the future of search. In Paris on November 30, 2010, I answered questions about my views of Exalead. As you know, Exalead is a unit of Dassault Systems, one of the most sophisticated engineering firms in the world. You can get my view of Exalead by navigating to this link. Here’s an example of the observations I made:

“Exalead delivers applications that fit seamlessly and smoothly into customer workflows,” said Arnold.  “When I spoke with Exalead customers I heard only:  ‘This system works,’ ‘It’s easy to use,’ ‘It’s stable,’ and ‘I don’t have to chase around.”

In the interview, I point out that Exalead’s engineering makes it possible to embed search and information access in applications. Instead of using key words to unlock the information in a traditional search and retrieval system, Exalead makes the needed information available within existing work flows and applications. Access extends across a full range of content types and devices, including smart phones.

I have tracked Exalead for a number of years, and it continues to distinguish itself in information access by going “Beyond Search.” Here at Beyond Search we use the Exalead platform for our Overflight service.

Stephen E Arnold, December 20, 2010

The Exalead engineering team bought me lunch, a plus in Paris. Too bad about the snow and ice, though.

Google, Multiple Operating Systems, and the Mad Scramble

December 19, 2010

I thought politicians changed their tune. Navigate to “The Cloud OS” and you will see that even wizards and former Math Club members can crawfish with the best of the Washington DC big wheels. Xooglers have, in my opinion, a schizophrenic knife edge. On one hand, Google gave them the moxie to be world beaters. On the other hand, Xooglers are no longer part of the Google.

The point of “The Cloud OS” is, well, it’s okay for Google to be Google. I don’t have any problem with a multi billion dollar company doing what it thinks furthers the shareholders’ interests. I am ambivalent about Google’s multiple operating system approach. I think most users don’t know an operating system from a solid state drive. Computing is on a trajectory to work like toasters. I don’t have a strong opinion about that shift either.

Here’s a passage from the write up that caught my attention:

One way of understanding this new architecture is to view the entire Internet as a single computer. This computer is a massively distributed system with billions of processors, billions of displays, exabytes of storage, and it’s spread across the entire planet. Your phone or laptop is just one part of this global computer, and its primarily purpose is to provide a convenient interface. The actual computation and data storage is distributed in surprisingly complex and dynamic ways, but that complexity is mostly hidden from the end user.

The big question is, “Who decides what does a function and when?” The answer, in my opinion, is the Math Club, Xooglers, and others of that ilk. The operating system is indeed irrelevant to the user. What matters is the control of the information utility.

Forget Google. Forget Gmail. Forget whatever hook one uses to think about a giant company controlling information plumbing. The physics of information work like the good old physics taught in  grad school. In systems, strange attractors grab old and structures emerge. The idea for online information is to “own” one of those emergent structures. Other, smaller structures exist, but the physics of information becomes interesting when one of these big, emergent systems snags “energy”. In information one can measure energy in money, clicks, volume of data, or some other situational metric. The idea, however, is that once a big emergent structure becomes manifest, that structure calls the shots.

So the chatter about operating systems is useful but it is like talking about a behavior at a boundary condition. The main event is the emergent system which may contain substructures. Although interesting, the substructures are subordinate to the main idea: control.

What’s this mean to Facebook, Google, and similar companies? A two class world. The builders and the users. Medieval, Dark Ages, paternal? These terms are indeed suggestive. The focus is the system, not the players. The information of physics suggests constant change and when new structures emerge a bit of desperation becomes discernable. Today’s dominant system may be tomorrow’s LTV or Enron because permanence is tough when bytes collide. The mad scramble is a nibble of revisionism, but instructive nevertheless. Just my opinion.

Stephen E Arnold, December 19, 2010

Freebie unlike ads on Facebook and Google

IBM Chases Predictive Analytics Opportunities

December 18, 2010

IBM was once a top technology provider but over the last few years it seems to have lost its oomph, maybe even a decline.

According to the Thomas Net News “New IBM Predictive Analytics Software Personalizes Customer Relationship Strategies,” IBM seems to be trying to bounce back with its new predictive analytics software. IBM attempts to get involved in the social media world and promises that with its SPSS Modeler “users can uncover and analyze information from social media sources, such as social networks and blogs and then merge that with internal data for accurate insight and predictive intelligence.”

More importantly companies could then use the data to better understand their customer fan base as well as for marketing and product development direction. Data analytics providers and the social media world are flourishing and it seems that IBM is trying to enter the game. However, it’s likely that IBM will be benched and forced to watch from the sidelines.

At the same time, SAS appears to be ramping up its effort in this sector as well. The battle of the statistics superstars in underway. Maybe a cable TV reality show here, gentle reader?

April Holmes, December 18, 2010

Freebie

For You N-Gram Fans

December 17, 2010

There are grams and n-grams. If you have not looked for occurrence data in the GOOG, navigate to http://ngrams.googlelabs.com/. If the link does not resolve, go to Google .com and enter the query “Google Books n-gram Viewer.” With a bit of effort, you can fire phrases words at the Google Book index and see counts.

I tested the phrase “information factory” and got no hits. My publisher has not made my monograph in which the phrase was used in the mid 1990s available. I ran a query on “information warfare” and there were no hits. Your queries may be more productive. The goose is too narrow for the service.

Stephen E Arnold, December 17, 2010

Freebie

Exclusive Interview with Kapow Software Founder

December 14, 2010

Our sister information service, Search Wizards Speak, published an exclusive interview with Stefan Andreasen, the founder of Kapow Software. You can read the full text of the discussion on the ArnoldIT.com Web site.

Kapow is a fast-growing company. The firm offers tools and services for what is called data integration. Other ways to characterize the firm’s impressive technology include data fusion, mashups, ETL (jargon for extracting, transforming and loading data from one system to another), and file conversion and slicing and dicing. The technology works within a browser and can mobile enable any application, integrated cloud applications, and migrate content from a source to another system.

In the interview, Mr. Andreasen said about the spark for the company:

As soon as we started building the foundational technology at Kapow.net in Denmark, I knew we were on to something special that had broad applicability far beyond that company. For one, the Web was evolving rapidly from an information-hub to a transaction-hub where businesses required the need to consolidate and automate millions of cross-application transactions in a scalable way. Also, Fortune 1000 companies were then and, as you know, even more so today, turning to outsourced consultants and hoards of manual workers to do the work that this innovation could do instantly.

On the subject of car manufacturer Audi’s use of the Kapow technology, he added:

In one user case, Audi, the automobile manufacturer, was able to eliminate dependencies, streamline their engineering process, and minimize the time-to-market on their new A8 model. Audi employs Katalyst to integrate data for their state of the art navigation system, called MMI, which combines Google Earth with real-time data about weather, gas prices, and other travel information, customizing the driver’s real-time experience according to their location and taste preferences. In developing the navigation system, Audi had relied on application providers to write custom real-time APIs compatible with the new Audi system. After months of waiting for the APIs and just two weeks away from the car launch date, Audi sought Kapow’s assistance. Katalyst was able to solve their problem quickly, wrapping their data providers’ current web applications into custom APIs and enabling Audi to meet their target launch date. By employing Kapow, Audi is now able to quickly launch the car in regional markets because Katalyst enables the Audi engineers to easily change and integrate new data sources for each market, in weeks rather than months.

For more information about Kapow, navigate to www.kapowsoftware.com. The full text of the interview is at http://www.arnoldit.com/search-wizards-speak/kapow.html.

Kenneth Toth, December 14, 2010

Freebie

Real Time Conversation with a Mid Tier Wizard

December 9, 2010

I am not making this conversation up. I gave a talk to 43 20 somethings at Skinker’s, a delightful place near London Bridge tube stop. No, I did not buy a Skinker’s T shirt, but it did look smart. My topic was real time search. More accurately, I was explaining the engineering considerations in delivering low latency indexing and querying which most vendors and second string consultants happily tell you is “real time search”.

The most interesting part of my evening was a short conversation I had with a mid tier consultant, what I call an azure chip consultant or generally the azurini. To be a blue chip consultant is easy. Just get hired by one of the two three or four management consulting firms, do some notable work, and not die of a heart attack from the pressure. Thousands of Type A’s who crave constant stroking takes a toll, believe you me. The mid tier lad introduced himself. He reminded me that I had met him before. In the dim light of Skinker’s I would not have  been able to recognize Tess, my deaf white boxer. No matter. A big grin and warm handshake were what the azure chip lad thought would jog my memory.

image

The basic idea is that real time is not achievable. There are gating factors at three main points in any content processing system. The first is the green box, which is the catch all for the service providers, ISPs, and others in the network chain. The pink  boxes represent the vendors providing services to the client who wants low latency service. The yellow boxes represent the different “friction points” behind the firewall or within the organization’s hybrid infrastructure. Resolving these points of “friction” boils down to brains and money. If an organization lacks either, the latency of the system will be high and increase over time. Users, of course, don’t know this. The problems latency produces range from financial losses to field operations personnel being killed due to stale intelligence.

It didn’t.

Anyway, three observations.

Read more

Digital Reasoning Unleashes Synthesys Version 3

December 6, 2010

Our sister publication covers the dynamic world of data fusion and next generation analytics. I wanted to call your attention of an interview with Tim Estes, the founder of Digital Reasoning. The company has announced a new version of the firm’s Synthesys product. You can read a complete, far ranging interview with Mr. Estes in the Search Wizards Speak series at this link. Our analyses of the Digital Reasoning technology are most encouraging.

Here’s a snippet of the interview’s contents from the Inteltrax story which ran earlier today:

Synthesys V3.0 provides a horizontally scalable solution for entity identification, resolution, and analysis from unstructured and structured data behind the firewall,” Estes said when asked about Digital Reasoning’s new offering. “Our customers are primarily in the defense and intelligence market at this point so we have focused on an architecture that is pure software and can run on a variety of server architectures.” In addition, the program is ripe with features that are miles beyond previous versions. “We’ve enhanced and improved the core language processing in dramatic ways. For example, there is more robustness against noisy and dirty data. And we have provided better analytics quality. We have also integrated fully with Hadoop for horizontal scale. We probably have one of the most flexible and scalable text processing architectures on the market today.”

While the company still works heavily with the government, Synthesys technology will benefit several other fields. “We are getting good bit of interest from companies that need what I call ‘big data analytics’ for financial services, legal eDiscovery, health care, and media tasks.” For example, the program: “can identify the who and the what, map the connections, and deliver the key insights.” Estes continues, “instead of clicking on links and scanning documents for information, Synthesys Version 3.0 moves the user from reading a ranked or filtered set of documents to a direct visual set of facts and relationships that are all linked back to the key contexts in documents or databases. One click and the user has the exact fact. Days and hours become minutes and seconds.”

Read more

Pricing 2011

December 2, 2010

When you read this article, the deal train may have left the station. Beyond Search is not a news publication, much to the chagrin of the Buffies and Trents who work in the “real news” game. The information in “Big Sale: Get Intellexer Summarizer and Categorizer with 50% Discount” is of interest to us in Harrod’s Creek because it hints at pricing 2011. According to the write up:

Save up to 50% with special sale offer from EffectiveSoft by ordering Summarizer or Categorizer tools in this year…. Intellexer Summarizer and Categorizer are semantic solutions intended for knowledge retrieval and data management. Categorizer will automatically organize a large amount of text files, and Summarizer will spare you reading the entire document and save your time for leisure. EffectiveSoft’s products are based on semantic platform Intellexer SDK (a unique product released by R&D department for knowledge management).  In addition to proprietary products development EffectiveSoft company enhances existing customer application with the power of semantic technologies.

EffectiveSoft is located in Minsk, Belarus and was founded in 2000. The company is a Microsoft Certified Gold Partner. More information is available at http://www.effectivesoft.com/.

Translation, summarization, leveling up, and bird’s-eye views are spilling into and across market segments. Customer support, business intelligence, and eDiscovery vendors want to process multiple languages and perform a range of content “value adds”. You can learn more about this particular offer at:

http://summarizer.intellexer.com
http://categorizer.intellexer.com

The question we asked ourselves was, “Will a lower price expand the market for these types of content processing systems?”

We know that some of the vendors following the path blazed by i2 Ltd 20 years ago are charging hefty fees for their systems. Other useful products like Inxight’s ThingFinder have dropped completely off our radar. In short, there is feverish activity in advanced content processing.

Maybe even more drastic price cuts are a way to fame and fortune? The problem is that a clever lad or lass can push some interesting software via open source or a giant troll of a company can just give advanced text processing away to get the maintenance and engineering services business.

Worth watching this pricing trend.

Stephen E Arnold, December 2, 2010

Freebie

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta