Microsoft and Its NLP Info Page

August 30, 2011

Microsoft is making a concerted effort to tackle natural language processing with its Redmond-based Natural Language Processing Group. The Microsoft page devoted to the group highlights current and older projects, downloads, and researchers involved.

The goal of the Natural Language Processing (NLP) group is to design and build software that will analyze, understand, and generate languages that humans use naturally, so that eventually you will be able to address your computer as though you were addressing another person. This goal is not easy to reach. “Understanding” language means, among other things, knowing what concepts a word or phrase stands for and knowing how to link those concepts together in a meaningful way.

Of particular interest are the recent publications authored by those in the group. Work includes everything from social media implementation, to multi-lingual Wikipedia content, to syntactic language modeling. The papers are well worth a read for anyone interested in the pressing field of natural language processing. Microsoft is definitely putting time and energy into the project, but it remains to be seen who of the tech giants will emerge the victor in the battle for natural language processing supremacy.

If you track NLP, including the newly minted azure chip consultants, you will want to monitor this aspect of Microsoft’s many, many search and text processing activities.

Emily Rae Aldridge, August 29, 2011

Sponsored by

Protected: Calculated Field Formulas for SharePoint Made Easy

August 30, 2011

This content is password protected. To view it please enter your password below:

Text Processing for Gender Info

August 27, 2011

Apparently researchers are proving what we have known all along, men and women communicate differently. In all seriousness, language patterns of tweets are being studied by the Mitre Corporation to determine if gender can be accurately assigned. Read more from, “Study shows how some tweeters can identify their gender without even trying.”

As the Mitre team shows in their report, there are certain “buzzwords” that can often be found by analyzing the output of female tweeters. Phrases such as “chocolate” and “shopping” are among the most repeated for women tweeters. The most popular phrases for men, you ask? “Http” and “Google”…hey we never said either gender was more interesting than the other.

The team determined that the female/male ratio on Twitter is 55/45, so a guess of “female” would prove correct 55% of the time. However, the team found success 75% of the time through analyze of certain phrases, like those mentioned above. Perhaps such research could lead to targeted gender-specific advertising. It is interesting regardless, and the full report could be worth a look.

Emily Rae Aldridge, August 27, 2011

Sponsored by

Janya Releases Semantic Analysis Platform

August 26, 2011

Janya, Inc., a leader in natural language processing, recently announced its release of Semantex 5.0, the most powerful version of its semantic analysis platform yet. The San Francisco Chronicle reports in, “Janya Announces Semantex™ 5.0 Multilingual Semantic Analysis Platform.”

Semantex has historically powered enterprise, SaaS social media analysis and government intelligence applications. The new enhancements reflected in Semantex™ 5.0 add significant value for existing customers and provide additional capabilities enabling a wide range of government, commercial, and academic uses. Semantex™ 5.0 can power solutions including market research, competitive intelligence, scientific, patent and medical data mining, e-discovery and compliance monitoring.

The new features mentioned include “improved language support, broader support for office document formats, new output formats, optimized pre-configured levels of processing, and highly scalable service oriented architecture support for Big Data deployments.”

Big data is a term hotly contested in tech circles, but we see the implication here. The more powerful and efficient the semantic analysis platform, the more meaning that can be derived from an enormous pool of data. Whether or not companies are ready to put the effort into analyzing and using such data remains to be seen. Regardless, Janya has produced a product that seems to be both efficient and highly customizable.

Emily Rae Aldridge, August 26, 2011

Sponsored by

Access Innovations Expands, Supports Medical Coding

August 17, 2011

We learned from one of our readers that Access Innovations that the company has expanded into an exciting new area—medical coding and analysis. In our opinion, the company is one of the leaders in taxonomy and controlled-term related systems and services delivering solutions that reduce errors and costs.

According to our reader:

Access Innovations, Inc., a leader in data integrity and content creation, has announced the Access Innovations Integrity Initiative (AI³), a suite of tools and services for quality assurance and validation of medical coding. Access Innovations Integrity Initiative is not just for physicians, hospitals, and their data service providers. It also includes tools that give auditors and insurers the information management tools they need to quickly identify areas of noncompliance or suspicious activity.

Margie Hlava, whom we interviewed a few weeks ago, told us:

We are dedicated to productivity and cost savings as a company. This new application of our long-standing tool set enables a radical departure from other less consistent and accurate tools. These are the tools used in scholarly publishing and other information activities for many years. Applying ANSI standard-compliant Data Harmony tools to the health arena, coupled with our support of automated coding accuracy, means cost savings as well as increased precision.

Why the expansion at a time when dozens of search and content processing companies are struggling to find shelter in the financial hail storms which buffet many vendors? According to Ms. Hlava,

Coding mistakes or improper coding adds to the cost of health care through out the service chain. AI³ can lower those administrative costs. An initial consultation leads to the development of an automated audit-trigger analysis, identifying inefficiencies and inaccuracies based on records, notes, or other supplied data. A rules-based approach allows for the analysis of dynamic data sets, unlike a purely statistical approach, which quickly becomes suboptimal as more data is entered. The system can be used to quickly and accurately validate medical coding or to locate errors in existing documentation. Our technology delivers cost savings without compromise.

For more information about Access Innovations’ services, navigate to Access Innovations Integrity Initiative. For more information about the firm’s landmark technology, navigate to this product catalog.

How do I know the company’s approach works? We used this system when I was working at the commercial database company producing ABI/INFORM, Business Dateline, and other high value, profitable databases.

Stephen E Arnold, August 1, 2011

Sponsored by

Exclusive Interview with Ana Athayde, Spotter SA

August 16, 2011

I have been monitoring Spotter SA, a European software development firm specializing in business intelligence for several years. A lengthy interview with the founder, Ana Athayde appears in the Search Wizards Speak section of the Web site.

The company has offices throughout Europe, the Middle East, and in the United States. The firm offers solutions in market sentiment, reputation management, risk assessment, crisis management, and competitive intelligence.

In the wide ranging interview, Ms. Athayde mentioned that she had been recognized as an exceptional manager, but she was quick to give credit to her staff and her chief technical officer, who was involved in the forward looking Datops SA content analytics service, now absorbed into the LexisNexis organization.

I asked her what pulled her into the vortex of content processing and analytics. She told me:

My background is business and marketing management in the sports field. In my first professional experience, I had to face major challenges in communication and marketing working for the International Olympic Committee. The amount of information published on those subjects was so huge that the first challenge was to solve the infoglut: not only to search for relevant information and build a list, but to understand opinions and assess reputation at an international level….I decided to fund a company to deliver a solution that could make use of information in textual form, what most people call unstructured data. But I knew that the information had to be presented in a way that a decision maker could actually use. Data dumps and row after row of numbers usually mean no one can tell what’s important without spending minutes, maybe hours deciphering the outputs.

I asked her about the firm’s technical plumbing. She replied:

The architecture of our own crawling system is based on proprietary methods to define and tune search scenarios. The “plumbing” is a fully scalable architecture which distributes tasks to schedulers. The content is processed, and we syndicate results. We use what we call “a source monitoring approach” which makes use of standard Web scraping methods. However, we have developed our own methods to adjust the scraping technology to each source in order to search all available documents. We extract metadata and relevant content from each page or content object.  Only documents which have been assessed as fresh are processed and provided to users. This assessment is done by a proprietary algorithm based on rules involving such factors as the publication date. This means that each document collected by Spotter’s tracking and monitoring system is stamped with a publication date. This date is extracted by the Web scraping technology, from the document content. The type of behavior of the source; that is, the source has a known update cycle. We analyze the text content of the document. And we use the date and time stamp on the document itself.

Anyone who has tried to use the dates provided in some commercial systems realizes that without accurate time context, much information is essentially useless without additional research and analysis.

To read the complete interview with Ms. Athayde, point your browser to the full text of our discussion. More information about Spotter SA is available at the firm’s Web site

Stephen E Arnold, August 16, 2011

Freebie but you may support our efforts by buying a copy of The New Landscape of Enterprise Search

Are Text Analytics Companies Learning the Silicon Valley Way?

August 11, 2011

Seth Grimes, founding chair for the Text Analytics Summit, interviewed three experts in order to find out what it is that Silicon Valley and the world of text analytics have in common. The full interview, “What Can Text Analytics and Silicon Valley Learn From Each Other?” can be found at Text Analytics News.

Grimes reports, “Business markets are global, yet the Bay Area stands out as a source and consumer of innovative technologies and in particular, as a pace-setter for the online and social worlds. With the Text Analytics Summit coming to San Jose, I reached out to a few west-coasters who are making Valley text analytics news: Nitin Indurkhya, principal research scientist at eBay Research Labs; YY Lee, COO of FirstRain; and Michael Osofsky, co-founder and chief innovation officer at NetBase.”

Osofsky explains the balance between precision and recall in text analytics, and urges Silicon Valley to understand that time and energy should be devoted to experimenting to find a balance between the two principles. On the other hand, Silicon Valley’s fast and exciting nature could be a good influence on the text analytics world. Software can be launched, edited, and evolved quickly and risks can be taken. Absorbing a bit of that mentality could enable text analytics to be a little more innovative and adventurous.

Indurkya encourages the text analytics world to adopt the Valley principle of “fail often and fail quickly.” In this way, he explains, innovation happens and failure does not bog down the overall momentum.

Lee encourages text analytics companies to focus separately on each of three equally important components: 1) Input 2) Internal process 3) Presentation. Each of the categories falls broadly under the category of text analytics and yet Lee stresses each must be treated independently during development.

Grimes concludes with his own collective thoughts on the three interviews.

The key takeaways that I see in these responses involve problem and product focus, agility, and the desirability of pulling and integrating information from multiple sources with the application of a variety of analytical techniques, in order to achieve technical and business goals. There’s no “Do X, Y, and Z” formula here, but there is definitely a sense of the rewards that are possible if text analytics is done right.

Out-of-the-box thinking is beneficial in any business arena, but especially those known more for rigidity than innovation.

Emily Rae Aldridge, August 11, 2011

Sponsored by, publishers of The New Landscape of Enterprise Search. And our own Stephen E Arnold is speaking at this year’s November 2011 event.

The Text Analytics Summit has been a staple of the text analytics community for the past 7 years. To help this community grow, the Text Analytics Summit is finally coming to the west coast to foster new networking opportunities, promote more healthy knowledge sharing, and create strong, long-lasting business relationships. Text Analytics is essential for maximizing the customer experience, effectively monitoring the social media world, conducting first-class data analysis and research, and improving the business decision making process. Attend the summit to discover how to unlock the power of text analytics to leverage new and profitable business opportunities. Whether you’re interested in taking advantage of social media analytics, customer experience management, sentiment analysis, or Voice of the Customer, Text Analytics Summit West is the only place to get the inside information that you need to stay ahead of the competition and profit from text mining. For more information, click here.


Inteltrax: Top Stories, August 1 to August 5

August 8, 2011

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically about the careers that have either sprouted up or drastically changed during data analytics’ rise.

The first such story, “Big Data Architects in Demand”  involved the rising importance of digital architects in the expanding data warehouse field.

Following the trend of data warehouses evolving with job responsibilities, our story, “Warehouse Database Administrator Roles Changing” showed how this important role, too, is rapidly altering as newer and more niche-oriented technology becomes available.

Stepping away from the warehouse and into the halls of congress, the article “Congressman Issa Weaves Government and Analytics Tighter”  shows how the roles of politicians are changing and becoming more efficient thanks to analytic tools.

While the economy is sputtering in many areas, we’ve seen nothing but growth in business intelligence and data analytics since launching our site over a year ago. Routinely, analytics firms post record earnings, which leads to more job opportunities. We expect to see this employment market grow and evolve as more companies learn how analytics can help them.

Follow the Inteltrax news stream by visiting

Patrick Roland, Editor, Inteltrax, August 8, 2011

Sponsored by Digital Reasoning, developers of the next generation content analytics system Synthsys.

Inteltrax: Top Stories, July 25 to July 29

August 1, 2011

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, each dealing either with some of the surprising negative news found in the analytics industry—each a lesson that can be learned for others. The system powering the Inteltrax system is called Augmentext.

Perhaps the most shocking tale of self-destruction we’ve seen in a while, “US Army is Not an Analytic Superpower,”  detailed how this defense branch spent over $2 billion taxpayer dollars for an analytic tool that never worked, when private companies could have been contracted out for pennies on the dollar.

Another sort-of David and Goliath story, “SAS Falling Behind in the Cloud,” detailed how one-time business intelligence superpower SAS has rested on its laurels and, in the process, become a joke in the competitive and lucrative world of cloud-based analytics.

Finally, we served up a cautionary tale to those believing everything they read with “Parallel NFS Barely on the Radar.” This was a story of warning, as the company in question got some great press for its software, but has almost no history to back it up, which made us incredibly suspicious.

These three stories are, thankfully, the exception and not the rule. Every day we are wowed by news of analytics and business intelligence helping practically every business imaginable. However, there are always rotten eggs, even during an impressive time of growth. That’s why we’re here, to help readers sort out the good and the bad and make more informed decisions.

Follow the Inteltrax news stream by visiting

Patrick Roland, Editor, Inteltrax, August 1, 2011

Sponsored by, publishers of The New Landscape of Enterprise Search

Must Attend Conference: Text Analytics Symposium

July 25, 2011

Analysis on another level has released information on the newest trend in business technology and marketing. Sentiment Analysis. In the press release “Sentiment Analysis Symposium to Spotlight Agency, Finance, Technology, and Social Media Thought Leaders, November 9,2011 in San Francisco,” we are able to gauge the excitement that is building behind this new approach to consumer marketing. The release asserted:

“Businesses are eager to extract and exploit consumer and market sentiment and opinion from the broad array of information sources online and in the enterprise,” said symposium chair Seth Grimes.

The conference is going to provide agency leaders with multiple solutions and networking opportunities. In its third year the conference boast participation from TripAdvisor, Saltlux, Acrolinx, and Amazon. The announcement added:

“They focus on online and social media measurement and analytics — on business intelligence for enterprise, Web, and social opinion sources — whether representing an enterprise-software leader or start-up, research firm, an online information provider, an agency, or a consultancy.”

The sentiment analysis approach to marketing, business and technology is becoming more and more prevalent. It promises to be an ‘area to watch’ and may explode into an industry to invest in somewhere in the near future.

We think the conference is a must attend affair. The US enterprise search conferences have been flapping and panting. The European conferences wobble around governance and content management. This conference is different. It has zing and substance.

Stephen E Arnold, July 25, 2011

Sponsored by, publishers of The New Landscape of Enterprise Search

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta