2013 Text Mining Summit Draws Record Crowd
November 2, 2013
The Linguamatics Blog recently reported on the outcome of the 2013 Text Mining Summit in the post “Pharma and Healthcare Come Together to See the Future of Text Mining.”
According to the article, this year’s event drew a record crowd of over 85 attendees who had the opportunity to listen to industry experts from the pharma and healthcare sector.
The article summarizes a few event highlights:
“Delegates were provided with an excellent opportunity to explore trends in text mining and analytics, natural language processing and knowledge discovery. Delegates discovered how I2E is delivering valuable intelligence from text in a range of applications, including the mining of scientific literature, news feeds, Electronic Health Records (EHRs), clinical trial data, FDA drug labels and more. Customer presentations demonstrated how I2E helps workers in knowledge driven organizations meet the challenge of information overload, maximize the value of their information assets and increase speed to insight.”
Events like the Text Analytics Summit are excellent opportunities for members of the data analytics community to gather and share their insights and new advances in the industry.
Jasmine Ashton, November 02, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Search Wizards Speak: Oleg Rogynskyy, Semantria
October 28, 2013
Semantria is a company focused on providing text and sentiment analysis to anyone. The company’s approach is to streamline the analysis of content to that in less than three minutes and for a nominal $1,000, the power of content processing can help answer tough business questions.
The firm’s founder is Oleg Rogynskyy, who has worked at Nstein (now part of Open Text) and Lexalytics. The idea for Semantria blossomed from Mr. Rogynskyy’s insight that text analytics technology was sufficiently mature so that it could be useful to almost any organization or business professionals.
I interviewed Mr. Rogynskyy on October 24, 2013. He told me:
At Semantria, we want to simplify and democratize access to text analytics technology. We want people to be able to get up and running in no time, with a small budget, and actually derive value from our technology. The classic story is you buy a system worth $100k and don’t deploy it.
Semantria focuses on a class of problems that a few years ago would have been outside the reach of many firms. He said:
We make it simple for our clients to solve the following problems: First, some organizations have too much text to read. For example, a Twitter stream or surveys with many responses. Also, there is the need to move quickly and reduce the time to get to market. Many survey results come with an expiry date before they’re irrelevant. Then there is reporting the information. Anyone can use their Excel smarts to build simple/interesting reports and visuals out of unstructured data. But that can take some time, and Semantria accelerates this step. Finally, users need to analyze text with the same impartiality each time. A human might see a glass as half full or half empty, but Semantria will always see a glass with water.
One of the most interesting aspects of Semantria is that the company delivers its solution as a cloud service. Mr. Rogynskyy observed:
We are happily in the cloud, and in the cloud we trust. We have android and iOS software development kits in the works, so whoever wants to talk to our API from mobile devices will be doing it with ease very soon.
You can get more information about Semantria at https://semantria.com.
This interview is one or more than 60 full-text interviews with individuals who are deeply involved in search, content processing, and analytics. You can find the full series at www.arnoldit.com/search-wizards-speak.
Stephen E Arnold, October 28, 2013
Bitext Teams Up with Actuate for New Data Analysis Solution
October 4, 2013
Bitext.com recently reported on an exciting new partnership in the news release “Actuate and Bitext Announce Collaboration to Deliver Text Analytics Engines and Sentiment Analysis for Big Data Through BIRT.”
According to the article, Bitext, a leader in sentiment analysis, is teaming up with Actuate, a business intelligence software creator, to produce a new and improved text and semantics analytics solution. Additionally, Bitext has announced that it will also be creating a solution with Salesforce.com.
The article states:
“‘Our collaboration with Bitext – providers of advanced semantic solutions for social media, search, and more – extends the types of analysis that can be performed with Actuate’s commercial BIRT developer and end-user platform or solution, by adding the ability to score sentiment toward products and services,” said Josep Arroyo, VP of Analytic Solutions at Actuate. “Users of Actuate with Bitext can now tap more than just negative or positive sentiment analysis. They can also visualize anticipated risks, opportunities and threats for personalized insights, in a single display on any device.’”
This new solution will be a huge asset to marketing professionals, as well as customer support specialists looking to use predictive analytics techniques to gain valuable insights into their customer base. For more information about Bitext, navigate to www.bitext.com.
Jasmine Ashton, October 4, 2013
Artificial Intelligence: Does Search Come Up Short?
September 21, 2013
In my view, artificial intelligence continues to capture attention. In actual use—particularly in search and content processing—AI evokes from me, “Aiiiiiiii.”
I read “The Unexpected Places Where Artificial Intelligence Will Emerge.” For investors who have pumped cash into various inventions that understand meaning, the article may surprise them. The future of AI is war, Google, Netflix, Amazon, spam, surveillance, robot space explorers, and financial trading.
The only challenge for AI is its lack of consistency. Smart systems work in certain circumstances and fail miserably in others. In my ISS lectures next week, I profile a number of systems which are alleged to be incredibly smart. The reality is that the systems are often rigged to generate expected outputs. The problem of “you don’t know what you don’t know” plagues the developers of these gee-whiz systems.
Will artificial intelligence improve search? Well, AI makes search easier for those who are happy to accept system outputs. For those who need to dig deeper, AI systems often produce results which do little to provide fine-grained detail or make it easy to identify suspect results.
For a good example of AI in action, look at Google search results when you are logged in. Examine Amazon recommendations closely. Better yet, watch the TV shows and films recommended for you by Netflix.
Stephen E Arnold, September 21, 2013
Text Process Made Simple
September 17, 2013
Nothing involving text sees simple: lines of words that go on for miles, often without proper punctuation or any at all. It needs to be cataloged and organized and tagged, but no one really wants to do that task. That is why “TextBlob: Simplified Text Processing” was born. What exactly is TextBlob? Here is the description straight from TextBlob’s homepage:
“TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, translation, and more.”
TextBlob is available for free download and has its own GitHub following. When it comes to installing the library, be aware that it relies on NLTK and pattern.en. Many of the features include: part-of-speech tagging, JSON serialization, word and phrase frequencies, n-grams, word inflection, tokenization, language translation and detection, noun phrase extraction, and sentiment analysis.
After downloading TextBlob, the Web site offers a comprehensive quick start guide for its users to understand how to implement and make the best usage out of the library. Free libraries make the open source community go around and improve ease of use for all users. If you use TextBlob, be sure to share any of your own libraries.
Whitney Grace, September 17, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
SharePoint Search: An Open Source Widget
September 15, 2013
If you have SharePoint responsibilities, you know how fabulous Microsoft’s Swiss Army knife solution is. Let me explain. The “fabulousness” applies to consultants, integrators, and “experts” who can make the rusty blade cut better than it does once the system is installed.
I learned about “SharePoint 2013 Search Query Tool” from one of the ArnoldIT SharePoint experts. You can download tool to test out and debug search queries against the SharePoint 2013 REST API. The tool does not help improve either the system or the user queries, but I find this software interesting for three reasons:
After years of Microsoft innovation, there are still issues with getting relevant results. Ergo the open source tool.
SharePoint does not provide a native administrative function to perform this type of testing.
Open source may be edging toward SharePoint. If the baby steps mature, will an open source snap in to replace the wild and crazy Fast Search & Transfer technology pop into being?
Stephen E Arnold, one of the world’s leading experts in information retrieval said:
Fast Search is on a technical par with SharePoint. The idea that two flawed systems can cope with changing user needs, Big Data, and unexpected system interactions is making SharePoint software which boosts costs. Change may be forced on Microsoft and without warning.
Worth thinking about and checking out the free widget.
Stuart Schram
Anonymizing Writing Style
September 11, 2013
Author J.K. Rowling recently learned firsthand how sophisticated analytics software has become. It was a linguistic analysis of the text in The Cuckoo’s Calling‘s which unmasked her as the popular crime-novel’s author “Robert Galbraith.” (These tools were originally devised to combat plagiarism.) Now, I Programmer tells us in “Anonymouth Hides Identity,” open-source software is being crafted to foil such tools, and give writers “stylometric anonymity.”
Whether a wordsmith just wants to enjoy a long-lost sense of anonymity, as the wildly successful author of the Harry Potter series attempted to do, or has more high-stakes reasons to hide behind a pen name, a team from Drexel University has the answer. The students from the school’s Privacy, Security, and Automation Lab (PSAL) just captured the Andreas Pfitzmann Best Student Paper Award at this year’s Privacy Enhancing Technologies Symposium for their paper on the subject. The article reveals:
The idea behind Anonymouth is that sylometry can be a threat in situations where individuals want to ensure their privacy while continuing to interact with others over the Internet. A presentation about the program cites two hypothetical scenarios:
*Alice the Anonymous Blogger vs.Bob the Abusive Employer
*Anonymous Forum vs. Oppressive Government. . . .
The JStylo-Anonymouth (JSAN) framework is work in progress at PSAL under the supervision of assistant professor of computer science, Dr. Rachel Greenstadt. It consists of two parts:
*JStylo – authorship attribution framework, used as the underlying feature extraction employing a set of linguistic features
*Anonymouth – authorship evasion (anonymization) framework, which suggests changes that need to be made.
The admittedly very small study discussed in the paper found that 80 percent of participants were able to produce anonymous documents “to a limited extent.” It also found certain constraints– it was more difficult to anonymize existing documents than new creations, for example. Still, this is an interesting development, and I am sure we will see more efforts in this direction.
Cynthia Murrell, September 11, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Text Analytics and Semantic Processing Fuel New Web Paradigm
August 27, 2013
Often, we look more specifically at various apps and applications that address search needs. Sometimes, it is refreshing to find articles that take a step back and look at the overall paradigm shifts guiding the feature updates and new technology releases flooding the media. Forbes reports on the big picture in “NetAppVoice: How The Semantic Web Changes Everything. Again!”
Evolving out of the last big buzz word, big data, semantic Web is now ubiquitous. Starting at the beginning, the article explains what semantic search allows people to do. A user can search for terms that retrieve results that go beyond keywords–through metadata and other semantic technologies associations between related concepts are created.
According to the article hyperconnectivity is the goal for promised meaningful insights to be delivered through semantic search:
For example, if we could somehow acquire all of the world’s knowledge, it wouldn’t make us smarter. It would just make us more knowledgeable. That’s exactly how search worked before semantics came along. In order for us to become smarter, we somehow need to understand the meaning of information. To do that we need to be able to forge connections in all this data, to see how each piece of knowledge relates to every other. In the semantic Web, we users provide the connections, through our social media activity. The patterns that emerge, the sentiment in the interactions—comments, shares, tweets, Likes, etc.—allow a very precise, detailed picture to emerge.
Enterprise organizations are in a unique position to achieve this hyperconnectivity and they also have a growing list of technological solutions to help break down silos and promote safe and secure data access to appropriate users. For example, text analytics and semantic processing for Cogito Intelligence API enhances the ability to decipher meaning and insights from a multitude of content sources including social media and unstructured corporate data.
Megan Feil, August 27, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Oracle Focuses On New Full Text Query
August 26, 2013
Despite enterprise companies moving away from SQL databases to the more robust NoSQL, Oracle has updated its database to include new features, including a XQuery Full Text search. We found an article that examines how the new function will affect Oracle and where it seems to point. The article from Amis Technology Blog: “Oracle Database 12c: XQuery Full Text” explains that the XQuery Full Text search was made to handle unstructured XML content. It does so by extending the XQuery XMLDB language. This finally makes Oracle capable of working with all types of XML. The rest of the article focuses on the XQuery code.
When the new feature was used on Wikipedia Content with XML content as well the test results were positive:
“During tests it proved very fast on English Wikipedia content (10++ Gb) and delivered the results within less than a second. But such a statement will only be picked up very efficiently if the new, introduced in 12c, corresponding Oracle XQuery Full-Text Index has been created.”
Oracle is trying to improve its technology as more of its users switch over to NoSQL databases. Improving the search function as well as other features keeps Oracle in the competition as well as proves that relational tables still have some kick in them. Interestingly enough Oracle appears to be focusing its energies on MarkLogic’s technology to keep in the race.
Whitney Grace, August 26, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
How Forensic Linguistics Helped Unmask Rowling
August 23, 2013
By now most have heard that J.K. Rowling, famous for her astoundingly successful Harry Potter books, has been revealed as the author of the well-received crime novel “The Cuckoo’s Calling.” Time spoke to one of the analysts who discovered that author Robert Galbraith was actually Rowling, and shares what they learned in, “J.K. Rowling’s Secret: a Forensic Linguist Explains how He Figured it Out.”
It started with a tip. Richard Brooks, editor of the British “Sunday Times,” received a mysterious tweet claiming that “Robert Galbraith” was a pen name for Rowling. Before taking the claim to the book’s publisher, Brooks called on Patrick Juola of Duquesne University to linguistically compare “The Cuckoo’s Calling” with the Potter books. Joula has had years of experience with forensic linguistics, specifically authorship attribution. Journalist Lily Rothman writes:
“The science is more frequently applied in legal cases, such as with wills of questionable origin, but it works with literature too. (Another school of forensic linguistics puts an emphasis on impressions and style, but Juola says he’s always worried that people using that approach will just find whatever they’re looking for.)
“But couldn’t an author trying to disguise herself just use different words? It’s not so easy, Juola explains. Word length, for example, is something the author might think to change — sure, some people are more prone to ‘utilize sesquipedalian lexical items,’ he jokes, but that can change with their audiences. What the author won’t think to change are the short words, the articles and prepositions. Juola asked me where a fork goes relative to a plate; I answered ‘on the left’ and wouldn’t ever think to change that, but another person might say ‘to the left’ or ‘on the left side.'”
One tool Juola uses is the free Java Graphical Authorship Attribution Program. After taking out rare words, names, and plot points, the software calculates the hundred most-used words from an author under consideration. Though a correlation does not conclusively prove that two authors are the same person, it can certainly help make the case. “Sunday Times” reporters took their findings to Galbraith’s/ Rowling’s publisher, who confirmed the connection. Though Rowling has said that using the pen name was liberating, she (and her favorite charities) may be happy with the over 500,000 percent increase in “Cukoo’s Calling” sales since her identity was uncovered.
The article notes that, though folks have been statistically analyzing text since the 1800s, our turn to e-books may make for a sharp increase in such revelations. Before that development, the process was slow even with computers, since textual analysis had to be preceded by the manual entry of texts via keyboard. Now, though, importing an entire tome is a snap. Rowling may be just be the last famous author to enjoy the anonymity of a pen name, even for just a few months.
Cynthia Murrell, August 23, 2013
Sponsored by ArnoldIT.com, developer of Augmentext