December 11, 2013
I read “Natural language Processing in the Kitchen.” The post was particularly relevant because I had worked through “The Main Trick in Machine Learning.” The essay does an excellent job of explaining coefficients (what I call for ease of recall, “thresholds.”) The idea is that machine learning requires a human to make certain judgments. Autonomy IDOL uses Bayesian methods and the company has for many years urged licensees to “train” the IDOL system. Not only that, successful Bayesian systems, like a young child, have to be prodded or retrained. How much and how often depends on the child. For Bayesian-like systems, the “how often” and “how much” varies by the licensees’ content contexts.
Now back to the Los Angeles Times’ excellent article about indexing and classifying a small set of recipes. Here’s the quote to note:
Computers can really only do so much.
When one jots down the programming and tuning work required to index recipes, keep in mind the “The Main Trick in Machine Learning.” There are three important lessons I draw from the boundary between these two write ups:
- Smart software requires programming and fiddling. At the present time (December 2013), this reality is as it has been for the last 50 years, maybe more.
- The humans fiddling with or setting up the content processing system have to be pretty darned clever. The notion of “user friendliness” is strongly disabused by these two articles. Flashy graphics and marketers’ cooing are not going to cut the mustard or the sirloin steak.
- The properly set up system with filtered information processed without some human intervention hits 98 percent accuracy. The main point is that relevance is a result of humans, software, and consistent, on point content.
How many enterprise search and content processing vendors explain that a failure to put appropriate resources toward the search or content processing implementation guarantees some interesting issues. Among them, systems will routinely deliver results that are not germane to the user’s query.
The roots of dissatisfaction with incumbent search and retrieval systems is not the systems themselves. In my opinion, most are quite similar, differing only in relatively minor details. (For examples of the similarity, review the reports at Xenky’s Vendor Profiles page.)
How many vendors have been excoriated because their customers failed to provide the cash, time, and support necessary to deliver a high-performance system? My hunch is that the vendors are held responsible for failures that are predestined by licensees’ desire to get the best deal possible and believe that magic just happens without the difficult, human-centric work that is absolutely essential for success.
Stephen E Arnold, December 11, 2013
December 11, 2013
I read about Palantir and its successful funding campaign in “Palantir’s Latest Round Valuing It at $9B Swells to $107.8M in New Funding.” Compared to the funding for ordinary search and content processing companies, Palantir is obviously able to attract investors better than most of the other companies that make sense out of data.
If you run a query for “Palantir” on Beyond Search, you will get links to articles about the company’s previous funding and to a couple of stories about the companies interaction with IBM i2 related to an allegation about Palantir’s business methods.
Image from the Louisiana Lottery.
I find Palantir interesting for three reasons.
First, it is able to generate significant buzz in police and intelligence entities in a number of countries. Based on what I have heard at conferences, the Palantir visualizations knock the socks off highly placed officials who want killer graphics in their personal slide presentations.
Second, the company has been nosing into certain financial markets. The idea is that the Palantir methods will give some of the investment outfits a better way to figure out what’s going up and what’s going down. The visuals are good, I have heard, but the Palantir analytics are perceived, if my sources are accurate, as better than those from companies like IBM SPSS, Digital Reasoning, Recorded Future, and similar analytics firms.
Third, the company may have moved into a new business sector. The firm’s success in fund raising begs the question, “Is Palantir becoming a vehicle to raise more and more cash?”
Palantir is worth monitoring. The visualizations and the math are not really a secret sauce. The magic ingredient at Palantir may be its ability to sell its upside to investors. Is Palantir introducing a new approach to search and content processing? The main business of the company could be raising more and more money.
Stephen E Arnold, December 11, 2013
November 25, 2013
With Google becoming more difficult to use, many professionals need a way to locate, filter, and obtain high value information that works. Silobreaker is an online service and system that delivers actionable information.
The co-founder of Silobreaker said in an exclusive interview for Search Wizards Speaks says:
I learned that in most of the organizations, information was locked in separate silos. The information in those silos was usually kept under close control by the silo manager. My insight was that if software could make available to employees the information in different silos, the organization would reap an enormous gain in productivity. So the idea was to “break” down the the information and knowledge silos that exists within companies, organizations and mindsets.
And knock down barriers the system has. Silobreaker’s popularity is surging. The most enthusiastic supporters of the system come from the intelligence community, law enforcement, analysts, and business intelligence professionals. A user’s query retrieves up-to-the-minute information from Web sources, commercial services, and open source content. The results are available as a series of summaries, full text documents, relationship maps among entities, and other report formats. The user does not have to figure out which item is an advertisement. The Silobreaker system delivers muscle, not fatty tissue.
Mr. Bjore, a former intelligence officer, adds:
Silobreaker is an Internet and a technology company that offers products and services which aggregate, analyze, contextualize and bring meaning to the ever-increasing amount of digital information.
Underscoring the difference between Silobreaker and other online systems, Mr. Bjore points out:
What sets us apart is not only the Silobreaker technology and our commitment to constant innovation. Silobreaker embodies the long term and active experience of having a team of users and developers who can understand the end user environment and challenges. Also, I want to emphasize that our technology is one integrated technology that combines access, content, and actionable outputs.
The ArnoldIT team uses Silobreaker in our intelligence-related work. We include a profile of the system in our lectures about next-generation information gathering and processing systems.
Stephen E Arnold, November 25, 2013
November 14, 2013
I read a quite remarkable news release. The title? Grab your blood pressure medicine because you may “explode.”
I expect a sign to warn me off. Was it safe to read about such a potentially powerful technology?
Straightaway I poked through my information about search vendors. I did not recall the name “Expertmaker.” I think it is catchy, echoing the Italian outfit Expert System.
Expertmaker is located at www.expertmaker.com. The company offers the following products:
Products that are “an online solution and/or mobile solution.”
Big Data Anti Churn. I am not exactly sure what this means, and I did not want to contact Expertmaker to learn more.
Flow, a virtual assistant platform.
The technology is positioned as “artificial intelligence.” The description of the company’s technology is located at this link. I scanned the information on the Expertmaker Web site. I noted some points that struck me as interesting, particularly in relation to the news release that triggered my interest. (Who says news releases are irrelevant? Expertmaker has my attention. I suppose that is a good thing, but there are other possible viewpoints too. My attention can be annoying, but, hey, this is a free blog about going “beyond search.”)
First, the label “artificial intelligence” is visible in the description. The AI angle is “machine learning and evolutionary computing.” The point is that the system performs functions that would be difficult using an old fashioned database like DB2, Oracle, or SQL Server. (I assume that the owners of these traditional databases will have some counter arguments to offer.)
Second, the system makes it possible to build search-based applications. (Dassault Exalead has been beating this tom tom for six or seven years. I presume that the Cloud 360 technology is relegated to the user car lot because Expertmaker has rolled into the search dealership.)
Third, a development environment is available, including a “Desktop Artificial Intelligence Toolkit.” There are “solvers.” There are various AI technologies. There is knowledge discovery. There is a “published solution.” And there is this component:
Semantic, value based, meta-data structures allow high precision understanding and value based searches. With the solution you can create your own semantic structures for handling complex solutions.
Okay, this is pretty standard fare for search start ups. I am not sure what the system does, but I looked at examples, including screenshots.
November 2, 2013
The Linguamatics Blog recently reported on the outcome of the 2013 Text Mining Summit in the post “Pharma and Healthcare Come Together to See the Future of Text Mining.”
According to the article, this year’s event drew a record crowd of over 85 attendees who had the opportunity to listen to industry experts from the pharma and healthcare sector.
The article summarizes a few event highlights:
“Delegates were provided with an excellent opportunity to explore trends in text mining and analytics, natural language processing and knowledge discovery. Delegates discovered how I2E is delivering valuable intelligence from text in a range of applications, including the mining of scientific literature, news feeds, Electronic Health Records (EHRs), clinical trial data, FDA drug labels and more. Customer presentations demonstrated how I2E helps workers in knowledge driven organizations meet the challenge of information overload, maximize the value of their information assets and increase speed to insight.”
Events like the Text Analytics Summit are excellent opportunities for members of the data analytics community to gather and share their insights and new advances in the industry.
Jasmine Ashton, November 02, 2013
October 28, 2013
Semantria is a company focused on providing text and sentiment analysis to anyone. The company’s approach is to streamline the analysis of content to that in less than three minutes and for a nominal $1,000, the power of content processing can help answer tough business questions.
The firm’s founder is Oleg Rogynskyy, who has worked at Nstein (now part of Open Text) and Lexalytics. The idea for Semantria blossomed from Mr. Rogynskyy’s insight that text analytics technology was sufficiently mature so that it could be useful to almost any organization or business professionals.
I interviewed Mr. Rogynskyy on October 24, 2013. He told me:
At Semantria, we want to simplify and democratize access to text analytics technology. We want people to be able to get up and running in no time, with a small budget, and actually derive value from our technology. The classic story is you buy a system worth $100k and don’t deploy it.
Semantria focuses on a class of problems that a few years ago would have been outside the reach of many firms. He said:
We make it simple for our clients to solve the following problems: First, some organizations have too much text to read. For example, a Twitter stream or surveys with many responses. Also, there is the need to move quickly and reduce the time to get to market. Many survey results come with an expiry date before they’re irrelevant. Then there is reporting the information. Anyone can use their Excel smarts to build simple/interesting reports and visuals out of unstructured data. But that can take some time, and Semantria accelerates this step. Finally, users need to analyze text with the same impartiality each time. A human might see a glass as half full or half empty, but Semantria will always see a glass with water.
One of the most interesting aspects of Semantria is that the company delivers its solution as a cloud service. Mr. Rogynskyy observed:
We are happily in the cloud, and in the cloud we trust. We have android and iOS software development kits in the works, so whoever wants to talk to our API from mobile devices will be doing it with ease very soon.
You can get more information about Semantria at https://semantria.com.
This interview is one or more than 60 full-text interviews with individuals who are deeply involved in search, content processing, and analytics. You can find the full series at www.arnoldit.com/search-wizards-speak.
Stephen E Arnold, October 28, 2013
October 4, 2013
Bitext.com recently reported on an exciting new partnership in the news release “Actuate and Bitext Announce Collaboration to Deliver Text Analytics Engines and Sentiment Analysis for Big Data Through BIRT.”
According to the article, Bitext, a leader in sentiment analysis, is teaming up with Actuate, a business intelligence software creator, to produce a new and improved text and semantics analytics solution. Additionally, Bitext has announced that it will also be creating a solution with Salesforce.com.
The article states:
“‘Our collaboration with Bitext – providers of advanced semantic solutions for social media, search, and more – extends the types of analysis that can be performed with Actuate’s commercial BIRT developer and end-user platform or solution, by adding the ability to score sentiment toward products and services,” said Josep Arroyo, VP of Analytic Solutions at Actuate. “Users of Actuate with Bitext can now tap more than just negative or positive sentiment analysis. They can also visualize anticipated risks, opportunities and threats for personalized insights, in a single display on any device.’”
This new solution will be a huge asset to marketing professionals, as well as customer support specialists looking to use predictive analytics techniques to gain valuable insights into their customer base. For more information about Bitext, navigate to www.bitext.com.
Jasmine Ashton, October 4, 2013
September 21, 2013
In my view, artificial intelligence continues to capture attention. In actual use—particularly in search and content processing—AI evokes from me, “Aiiiiiiii.”
I read “The Unexpected Places Where Artificial Intelligence Will Emerge.” For investors who have pumped cash into various inventions that understand meaning, the article may surprise them. The future of AI is war, Google, Netflix, Amazon, spam, surveillance, robot space explorers, and financial trading.
The only challenge for AI is its lack of consistency. Smart systems work in certain circumstances and fail miserably in others. In my ISS lectures next week, I profile a number of systems which are alleged to be incredibly smart. The reality is that the systems are often rigged to generate expected outputs. The problem of “you don’t know what you don’t know” plagues the developers of these gee-whiz systems.
Will artificial intelligence improve search? Well, AI makes search easier for those who are happy to accept system outputs. For those who need to dig deeper, AI systems often produce results which do little to provide fine-grained detail or make it easy to identify suspect results.
For a good example of AI in action, look at Google search results when you are logged in. Examine Amazon recommendations closely. Better yet, watch the TV shows and films recommended for you by Netflix.
Stephen E Arnold, September 21, 2013
September 17, 2013
Nothing involving text sees simple: lines of words that go on for miles, often without proper punctuation or any at all. It needs to be cataloged and organized and tagged, but no one really wants to do that task. That is why “TextBlob: Simplified Text Processing” was born. What exactly is TextBlob? Here is the description straight from TextBlob’s homepage:
“TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, translation, and more.”
TextBlob is available for free download and has its own GitHub following. When it comes to installing the library, be aware that it relies on NLTK and pattern.en. Many of the features include: part-of-speech tagging, JSON serialization, word and phrase frequencies, n-grams, word inflection, tokenization, language translation and detection, noun phrase extraction, and sentiment analysis.
After downloading TextBlob, the Web site offers a comprehensive quick start guide for its users to understand how to implement and make the best usage out of the library. Free libraries make the open source community go around and improve ease of use for all users. If you use TextBlob, be sure to share any of your own libraries.
Whitney Grace, September 17, 2013
September 15, 2013
If you have SharePoint responsibilities, you know how fabulous Microsoft’s Swiss Army knife solution is. Let me explain. The “fabulousness” applies to consultants, integrators, and “experts” who can make the rusty blade cut better than it does once the system is installed.
I learned about “SharePoint 2013 Search Query Tool” from one of the ArnoldIT SharePoint experts. You can download tool to test out and debug search queries against the SharePoint 2013 REST API. The tool does not help improve either the system or the user queries, but I find this software interesting for three reasons:
After years of Microsoft innovation, there are still issues with getting relevant results. Ergo the open source tool.
SharePoint does not provide a native administrative function to perform this type of testing.
Open source may be edging toward SharePoint. If the baby steps mature, will an open source snap in to replace the wild and crazy Fast Search & Transfer technology pop into being?
Stephen E Arnold, one of the world’s leading experts in information retrieval said:
Fast Search is on a technical par with SharePoint. The idea that two flawed systems can cope with changing user needs, Big Data, and unexpected system interactions is making SharePoint software which boosts costs. Change may be forced on Microsoft and without warning.
Worth thinking about and checking out the free widget.