Text Processing for Gender Info
August 27, 2011
Apparently researchers are proving what we have known all along, men and women communicate differently. In all seriousness, language patterns of tweets are being studied by the Mitre Corporation to determine if gender can be accurately assigned. Read more from, “Study shows how some tweeters can identify their gender without even trying.”
As the Mitre team shows in their report, there are certain “buzzwords” that can often be found by analyzing the output of female tweeters. Phrases such as “chocolate” and “shopping” are among the most repeated for women tweeters. The most popular phrases for men, you ask? “Http” and “Google”…hey we never said either gender was more interesting than the other.
The team determined that the female/male ratio on Twitter is 55/45, so a guess of “female” would prove correct 55% of the time. However, the team found success 75% of the time through analyze of certain phrases, like those mentioned above. Perhaps such research could lead to targeted gender-specific advertising. It is interesting regardless, and the full report could be worth a look.
Emily Rae Aldridge, August 27, 2011
Sponsored by Pandia.com
Janya Releases Semantic Analysis Platform
August 26, 2011
Janya, Inc., a leader in natural language processing, recently announced its release of Semantex 5.0, the most powerful version of its semantic analysis platform yet. The San Francisco Chronicle reports in, “Janya Announces Semantex™ 5.0 Multilingual Semantic Analysis Platform.”
Semantex has historically powered enterprise, SaaS social media analysis and government intelligence applications. The new enhancements reflected in Semantex™ 5.0 add significant value for existing customers and provide additional capabilities enabling a wide range of government, commercial, and academic uses. Semantex™ 5.0 can power solutions including market research, competitive intelligence, scientific, patent and medical data mining, e-discovery and compliance monitoring.
The new features mentioned include “improved language support, broader support for office document formats, new output formats, optimized pre-configured levels of processing, and highly scalable service oriented architecture support for Big Data deployments.”
Big data is a term hotly contested in tech circles, but we see the implication here. The more powerful and efficient the semantic analysis platform, the more meaning that can be derived from an enormous pool of data. Whether or not companies are ready to put the effort into analyzing and using such data remains to be seen. Regardless, Janya has produced a product that seems to be both efficient and highly customizable.
Emily Rae Aldridge, August 26, 2011
Sponsored by Pandia.com
Access Innovations Expands, Supports Medical Coding
August 17, 2011
We learned from one of our readers that Access Innovations that the company has expanded into an exciting new area—medical coding and analysis. In our opinion, the company is one of the leaders in taxonomy and controlled-term related systems and services delivering solutions that reduce errors and costs.
According to our reader:
Access Innovations, Inc., a leader in data integrity and content creation, has announced the Access Innovations Integrity Initiative (AI³), a suite of tools and services for quality assurance and validation of medical coding. Access Innovations Integrity Initiative is not just for physicians, hospitals, and their data service providers. It also includes tools that give auditors and insurers the information management tools they need to quickly identify areas of noncompliance or suspicious activity.
Margie Hlava, whom we interviewed a few weeks ago, told us:
We are dedicated to productivity and cost savings as a company. This new application of our long-standing tool set enables a radical departure from other less consistent and accurate tools. These are the tools used in scholarly publishing and other information activities for many years. Applying ANSI standard-compliant Data Harmony tools to the health arena, coupled with our support of automated coding accuracy, means cost savings as well as increased precision.
Why the expansion at a time when dozens of search and content processing companies are struggling to find shelter in the financial hail storms which buffet many vendors? According to Ms. Hlava,
Coding mistakes or improper coding adds to the cost of health care through out the service chain. AI³ can lower those administrative costs. An initial consultation leads to the development of an automated audit-trigger analysis, identifying inefficiencies and inaccuracies based on records, notes, or other supplied data. A rules-based approach allows for the analysis of dynamic data sets, unlike a purely statistical approach, which quickly becomes suboptimal as more data is entered. The system can be used to quickly and accurately validate medical coding or to locate errors in existing documentation. Our technology delivers cost savings without compromise.
For more information about Access Innovations’ services, navigate to Access Innovations Integrity Initiative. For more information about the firm’s landmark technology, navigate to this product catalog.
How do I know the company’s approach works? We used this system when I was working at the commercial database company producing ABI/INFORM, Business Dateline, and other high value, profitable databases.
Stephen E Arnold, August 1, 2011
Sponsored by Pandia.com
Exclusive Interview with Ana Athayde, Spotter SA
August 16, 2011
I have been monitoring Spotter SA, a European software development firm specializing in business intelligence for several years. A lengthy interview with the founder, Ana Athayde appears in the Search Wizards Speak section of the ArnoldIT.com Web site.
The company has offices throughout Europe, the Middle East, and in the United States. The firm offers solutions in market sentiment, reputation management, risk assessment, crisis management, and competitive intelligence.
In the wide ranging interview, Ms. Athayde mentioned that she had been recognized as an exceptional manager, but she was quick to give credit to her staff and her chief technical officer, who was involved in the forward looking Datops SA content analytics service, now absorbed into the LexisNexis organization.
I asked her what pulled her into the vortex of content processing and analytics. She told me:
My background is business and marketing management in the sports field. In my first professional experience, I had to face major challenges in communication and marketing working for the International Olympic Committee. The amount of information published on those subjects was so huge that the first challenge was to solve the infoglut: not only to search for relevant information and build a list, but to understand opinions and assess reputation at an international level….I decided to fund a company to deliver a solution that could make use of information in textual form, what most people call unstructured data. But I knew that the information had to be presented in a way that a decision maker could actually use. Data dumps and row after row of numbers usually mean no one can tell what’s important without spending minutes, maybe hours deciphering the outputs.
I asked her about the firm’s technical plumbing. She replied:
The architecture of our own crawling system is based on proprietary methods to define and tune search scenarios. The “plumbing” is a fully scalable architecture which distributes tasks to schedulers. The content is processed, and we syndicate results. We use what we call “a source monitoring approach” which makes use of standard Web scraping methods. However, we have developed our own methods to adjust the scraping technology to each source in order to search all available documents. We extract metadata and relevant content from each page or content object. Only documents which have been assessed as fresh are processed and provided to users. This assessment is done by a proprietary algorithm based on rules involving such factors as the publication date. This means that each document collected by Spotter’s tracking and monitoring system is stamped with a publication date. This date is extracted by the Web scraping technology, from the document content. The type of behavior of the source; that is, the source has a known update cycle. We analyze the text content of the document. And we use the date and time stamp on the document itself.
Anyone who has tried to use the dates provided in some commercial systems realizes that without accurate time context, much information is essentially useless without additional research and analysis.
To read the complete interview with Ms. Athayde, point your browser to the full text of our discussion. More information about Spotter SA is available at the firm’s Web site www.spotter.com.
Stephen E Arnold, August 16, 2011
Freebie but you may support our efforts by buying a copy of The New Landscape of Enterprise Search
Are Text Analytics Companies Learning the Silicon Valley Way?
August 11, 2011
Seth Grimes, founding chair for the Text Analytics Summit, interviewed three experts in order to find out what it is that Silicon Valley and the world of text analytics have in common. The full interview, “What Can Text Analytics and Silicon Valley Learn From Each Other?” can be found at Text Analytics News.
Grimes reports, “Business markets are global, yet the Bay Area stands out as a source and consumer of innovative technologies and in particular, as a pace-setter for the online and social worlds. With the Text Analytics Summit coming to San Jose, I reached out to a few west-coasters who are making Valley text analytics news: Nitin Indurkhya, principal research scientist at eBay Research Labs; YY Lee, COO of FirstRain; and Michael Osofsky, co-founder and chief innovation officer at NetBase.”
Osofsky explains the balance between precision and recall in text analytics, and urges Silicon Valley to understand that time and energy should be devoted to experimenting to find a balance between the two principles. On the other hand, Silicon Valley’s fast and exciting nature could be a good influence on the text analytics world. Software can be launched, edited, and evolved quickly and risks can be taken. Absorbing a bit of that mentality could enable text analytics to be a little more innovative and adventurous.
Indurkya encourages the text analytics world to adopt the Valley principle of “fail often and fail quickly.” In this way, he explains, innovation happens and failure does not bog down the overall momentum.
Lee encourages text analytics companies to focus separately on each of three equally important components: 1) Input 2) Internal process 3) Presentation. Each of the categories falls broadly under the category of text analytics and yet Lee stresses each must be treated independently during development.
Grimes concludes with his own collective thoughts on the three interviews.
The key takeaways that I see in these responses involve problem and product focus, agility, and the desirability of pulling and integrating information from multiple sources with the application of a variety of analytical techniques, in order to achieve technical and business goals. There’s no “Do X, Y, and Z” formula here, but there is definitely a sense of the rewards that are possible if text analytics is done right.
Out-of-the-box thinking is beneficial in any business arena, but especially those known more for rigidity than innovation.
Emily Rae Aldridge, August 11, 2011
Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search. And our own Stephen E Arnold is speaking at this year’s November 2011 event.
The Text Analytics Summit has been a staple of the text analytics community for the past 7 years. To help this community grow, the Text Analytics Summit is finally coming to the west coast to foster new networking opportunities, promote more healthy knowledge sharing, and create strong, long-lasting business relationships. Text Analytics is essential for maximizing the customer experience, effectively monitoring the social media world, conducting first-class data analysis and research, and improving the business decision making process. Attend the summit to discover how to unlock the power of text analytics to leverage new and profitable business opportunities. Whether you’re interested in taking advantage of social media analytics, customer experience management, sentiment analysis, or Voice of the Customer, Text Analytics Summit West is the only place to get the inside information that you need to stay ahead of the competition and profit from text mining. For more information, click here.
Inteltrax: Top Stories, August 1 to August 5
August 8, 2011
Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically about the careers that have either sprouted up or drastically changed during data analytics’ rise.
The first such story, “Big Data Architects in Demand” involved the rising importance of digital architects in the expanding data warehouse field.
Following the trend of data warehouses evolving with job responsibilities, our story, “Warehouse Database Administrator Roles Changing” showed how this important role, too, is rapidly altering as newer and more niche-oriented technology becomes available.
Stepping away from the warehouse and into the halls of congress, the article “Congressman Issa Weaves Government and Analytics Tighter” shows how the roles of politicians are changing and becoming more efficient thanks to analytic tools.
While the economy is sputtering in many areas, we’ve seen nothing but growth in business intelligence and data analytics since launching our site over a year ago. Routinely, analytics firms post record earnings, which leads to more job opportunities. We expect to see this employment market grow and evolve as more companies learn how analytics can help them.
Follow the Inteltrax news stream by visiting www.inteltrax.com
Patrick Roland, Editor, Inteltrax, August 8, 2011
Sponsored by Digital Reasoning, developers of the next generation content analytics system Synthsys.
Inteltrax: Top Stories, July 25 to July 29
August 1, 2011
Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, each dealing either with some of the surprising negative news found in the analytics industry—each a lesson that can be learned for others. The system powering the Inteltrax system is called Augmentext.
Perhaps the most shocking tale of self-destruction we’ve seen in a while, “US Army is Not an Analytic Superpower,” detailed how this defense branch spent over $2 billion taxpayer dollars for an analytic tool that never worked, when private companies could have been contracted out for pennies on the dollar.
Another sort-of David and Goliath story, “SAS Falling Behind in the Cloud,” detailed how one-time business intelligence superpower SAS has rested on its laurels and, in the process, become a joke in the competitive and lucrative world of cloud-based analytics.
Finally, we served up a cautionary tale to those believing everything they read with “Parallel NFS Barely on the Radar.” This was a story of warning, as the company in question got some great press for its software, but has almost no history to back it up, which made us incredibly suspicious.
These three stories are, thankfully, the exception and not the rule. Every day we are wowed by news of analytics and business intelligence helping practically every business imaginable. However, there are always rotten eggs, even during an impressive time of growth. That’s why we’re here, to help readers sort out the good and the bad and make more informed decisions.
Follow the Inteltrax news stream by visiting www.inteltrax.com
Patrick Roland, Editor, Inteltrax, August 1, 2011
Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search
Must Attend Conference: Text Analytics Symposium
July 25, 2011
Analysis on another level
PRWeb.com has released information on the newest trend in business technology and marketing. Sentiment Analysis. In the press release “Sentiment Analysis Symposium to Spotlight Agency, Finance, Technology, and Social Media Thought Leaders, November 9,2011 in San Francisco,” we are able to gauge the excitement that is building behind this new approach to consumer marketing. The release asserted:
“Businesses are eager to extract and exploit consumer and market sentiment and opinion from the broad array of information sources online and in the enterprise,” said symposium chair Seth Grimes.
The conference is going to provide agency leaders with multiple solutions and networking opportunities. In its third year the conference boast participation from TripAdvisor, Saltlux, Acrolinx, and Amazon. The announcement added:
“They focus on online and social media measurement and analytics — on business intelligence for enterprise, Web, and social opinion sources — whether representing an enterprise-software leader or start-up, research firm, an online information provider, an agency, or a consultancy.”
The sentiment analysis approach to marketing, business and technology is becoming more and more prevalent. It promises to be an ‘area to watch’ and may explode into an industry to invest in somewhere in the near future.
We think the conference is a must attend affair. The US enterprise search conferences have been flapping and panting. The European conferences wobble around governance and content management. This conference is different. It has zing and substance.
Stephen E Arnold, July 25, 2011
Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search
Attensity Command Center Gives Clients Control
July 21, 2011
“Attensity Looks to Give Brands a Window into Social Media,” reports the Silicon Valley BizBlog. Attensity is touting its new Command Center software, which takes social media analysis a step further. It’s designed to display the real time information continuously to their customers’ employees. What caught my eye was this passage:
The Attensity Command Center is basically a bank of monitors and the back end software to run the monitors. Using proprietary, patented text analysis algorithms, the platform categorizes incoming tweets by subject, sentiment, and geography, etc. The goal is to aggregate and visualize what’s being said online, so that the customers can know in real time how many people are talking about them and what they’re saying.
Writer Jon Xavier experienced a demo of the product, and was suitably impressed. His only issue was that the passing tweets moved too fast to read them. He noted that to make full use of the software, a company would have to dedicate a couple of employees to monitoring and acting on the information.
Nope, it is not virtual. Will social media augment this reality? Image source: http://goo.gl/i3TIb
The interest in social media is fascinating. Once the Internet was for rocket scientists. Now the Internet is the place to stroll. A digital las ramblas. When gizmos are embedded in the human body, the Information Highway takes on an interesting shape. The metaphors used to describe the next big thing will be interesting. For now, Attensity touts control
With this offering, Attensity amps up marketing in the ad sector. Will it be enough to make headway against the Google+ marketing cyclone?
Stephen E Arnold, July 21, 2011
Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search
Symantec Snaps Up Clearwell to Enter E Discovery Market
July 20, 2011
I do some odd jobs for Enterprise Technology Management. Among them is hosting podcasts on various topics. Last week we did a podcast with several luminaries in the e discovery market. E Discovery is a term used to describe the content and text processing required to figure out what is in unstructured content gathered in a legal matter. There doesn’t have to be a law suit to trigger a company’s running an e Discovery project, but unlike search, e Discovery beckons legal eagles.
We read the article “Symantec acquires Clearwell Systems for $390m.” Perhaps best known for their antivirus software, Symantec also offers an array of information management solutions. Clearwell Systems specializes in e-discovery tools, used in response to litigation and other legal/ investigative matters.
Symantec gains much with the acquisition:
Symantec notes the acquisition will add archiving, backup and eDiscovery offerings to its existing offerings, enabling it to offer a broader set of information management capabilities to customers. The deal will help Symantec provide future product integration opportunities with Symantec backup and security, Symantec NetBackup, Data Loss Prevention and Data Insight, the company said.
This acquisition moves e-discovery to the cloud, while continuing the appliance approach.
On the podcast I learned:
- There will be a push for more hosted services. Autonomy has done a good job with its Zantaz acquisition and its hosted services, so Symantec is going down a route that leads to a pay off.
- The Clearwell approach will continue to feature its rapid deployment model. I associated the phrase “rocket docket” with Clearwell which connotes speedy service.
- The Clearwell report and user audit functions will be expanded and enhanced. I saw a Clearwell report and watched an attorney pop it in an envelope for delivery to another attorney. The system impressed me because the report did not require any fiddling by the attorney. Good stuff.
Naturally, other new services are planned. Stay tuned.
Cynthia Murrell July 14, 2011