Watson Scores a Mention in Data Dust Up
June 9, 2013
I read “How the U.S. Uses Technology to Mine More Data More Quickly.” You can read pundits, poobahs, mavens, and unemployed journalists’ views of data monitoring elsewhere. I want to point out that Watson, IBM’s Jeopardy winning smart software has made an appearance in the discussion of data intercepts.
Here’s the passage I noted:
I.B.M.’s Watson, the supercomputing technology that defeated human Jeopardy! champions in 2011, is a prime example of the power of data-intensive artificial intelligence. Watson-style computing, analysts said, is precisely the technology that would make the ambitious data-collection program of the N.S.A. seem practical. Computers could instantly sift through the mass of Internet communications data, see patterns of suspicious online behavior and thus narrow the hunt for terrorists. Both the N.S.A. and the Central Intelligence Agency have been testing Watson in the last two years, said a consultant who has advised the government and asked not to be identified because he was not authorized to speak.
From health care to government work, Watson is there. No further comment from the goose except that if one does not know what to query, math provides some candidates, not answers.
Stephen E Arnold, June 9, 2013
Sponsored by Xenky, the finder for ArnoldIT content
New WebProtege Version Available
June 9, 2013
The newest version of the open-sourced, web-based ontology editor WebProtégé is now available. For those unfamiliar, the tool’s description explains:
“WebProtégé is an open source, lightweight, web-based ontology editor. WebProtégé provides a friendly and highly configurable user interface that can be adapted for the use of domain experts. It has support for form-based editing and full-fledged collaboration.”
WebProtégé 1.0 will no longer be improved, so users are encouraged to migrate to the latest iteration, on Build 102 as of this writing. The release notes issue this important caution to those who have been using a local installation:
“We have renamed one of the portlets, which affects the default configurations of the projects. Please delete the default-ui-configuration-data from your WebProtege data directory, and then restart tomcat. WebProtege will copy the new default configurations back to that folder.”
Important detail, that. The administrator’s guide gives detailed instructions for installing, deploying, testing, updating, and troubleshooting. To migrate an ontology from WebProtégé 1.0 server to the new version, see the instructions here. Users might also check out the hour-long webcast, which demonstrates some features and explains how to establish an account, upload an ontology, share, edit, and download revisions.
Cynthia Murrell, June 09, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Search and Content Processing Vendor in the Spotlight
June 8, 2013
Once again I have no opinion about allegations regarding data intercepts. Not my business. Here in Harrod’s Creek, I am thrilled to have electric power and a couple of dogs to accompany me on my morning walk in the hollow by the pond filled with mine drainage.
I did read a TPM story commenting about Palantir, a company which has more than $100 million in funding and now has a PR profile higher than the Empire State Building. The write up explains that a company with search, connectors, and some repackaged numerical recipes may be involved with certain US government activities.
Here’s a quote from a quote in the write up:
Apparently, Palantir has a software package called “Prism”: “Prism is a software component that lets you quickly integrate external databases into Palantir.” That sounds like exactly the tool you’d want if you were trying to find patterns in data from multiple companies.
The write up has some links to Palantir documents.
Several thoughts:
First, there are quite a few firms working in the same content processing sector as Palantir. Some of these you may know; for example IBM. Others are probably off your radar and maybe drifting into oblivion like Digital Reasoning. The point is that many organizations looking to make money from search and content processing have turned to government contracts to stay afloat. Why haven’t real journalists and azure chip consultants cranking out pay to play profiles described the business functions of these outfits? Maybe these experts and former English majors are not such smart folks after all. Writing about Microsoft is just easier perhaps>
Second, the fancy math outfits are not confined to Silicon Valley. Nope, there are some pretty clever systems built and operated outside the US. You can find some nifty technology in such surprising places as downtown Paris, a Stockholm suburb, and far off Madrid. Why? There is a global appetite for software and systems which can make sense of Big Data. I don’t want to rain on anyone’s parade, but these systems do not vary too much. They use similar math, have similar weaknesses, and similar outputs. The reason? Ah, gentle reader, Big O helps make clear why fancy math systems are pretty much alike as information access systems have been for decades.
Third, the marketers convince the bureaucrats that they have a capability which is bigger, faster, and cheaper. In today’s world this translates to giant server farms and digital Dysons. When the marketers have moved on to sell Teslas, lesser souls are left with the task of making the systems work.
My view is that we are in the midst of the largest single PR event related to search in my lifetime.
Will the discussion of search and content processing improve information access?
Nope.
Will the visibility alter the trajectory of hybrid systems which “understand” content?
Nope.
Will Big Data yield high value insights which the marketers promised?
Nope.
My thought is that there will be more marketing thrills in the search and content processing sector. Stay tuned but don’t use a fancy math system to pick your retirement investment, the winner of today’s Belmont, or do much more than deliver a 1970s type of survey output.
Oh, the Big O. The annoying computational barriers. The need to recycle a dozen or so well known math methods juiced with some visualizations.
The search and content processing bandwagon rolls forward. The cloud of unknowing surrounds information access. What’s new?
Stephen E Arnold, June 8, 2013
Sponsored by Xenky, the ArnoldIT portal.
Toronto Hackathon
June 8, 2013
Are hackers a good thing or a bad thing? In the realm of computers, the term used to simply refer to those breaking into the systems of others (bad), but has gained some positive definitions along the way. “Hacking” can now refer to heavily modifying one’s own system or devising unique solutions to challenging problems. PRWeb informs us that “Semantria and Lexalytics Excited to Provide Unlimited API Access for Viafoura Hackathon.” I think organizers had one of the less nefarious definitions in mind. The write-up informs us:
“The Viafoura Hackathon is part of Big Data Week, an international festival focusing on the social, political, technological and commercial impacts of Big Data.
“We’re very pleased to partner up with Semantria and Lexalytics on this Hackathon,” said Ali Ghafour, Viafoura Founder and CTO. ‘We’re excited to see what people will come up with by combining large datasets from media companies with high-end Natural Language Processing technology. Viafoura loves these types of challenges, and we are happy to have Semantria/Lexalytics and the Toronto development community join us.'”
Perhaps I’m a purist, but personally I’d rather a term not gather meanings like a dog in an open field gathers grass seeds. Nevertheless, I sincerely hope all the “hackers” had a good time.
Large media companies rely on Viafoura for audience engagement and monetization solutions. The company, which is headquartered in Toronto, hosted Big Data Week in recognition of big data’s booming importance.
Founded in 2003, Lexalytics creates text mining software for integration into third-party software. The company co-founded Semantria, a services and SAAS firm specializing in cloud-based text and sentiment analysis. That outfit launched in 2011.
Cynthia Murrell, June 08, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
The Old Bayesian Recipe: Burning the Predictive Reality Cupcakes
June 7, 2013
I don’t have any comment about the alleged surveillance conducted by governments or the comments of giant online vendors alleged interactions with governments. I will leave the subject and speculations to those younger than I and possibly — just possibly — less well informed folks.
I do want to call attention to the write up “How Likely Is the NSA PRISM Program to Catch a Terrorist?” The source is the Bayesian Biologist. I know less about PRISM, biology, and Bayes than my neighbors here in Harrod’s Creek, Kentucky.
Here’s the snippet I noted in the “How Likely” story:
for every positive (the NSA calls these ‘reports’) there is only a 1 in 10,102 chance (using our rough assumptions) that they’ve found a real bad guy.Big brother is always watching, but he’s still got a needle in a haystack problem.
I think that there might be some fascinating marketing hype, fear, and salami in the digital blender.
In my most recent lecture about Big Data and the limitations of today’s software:
Collecting is one thing. Finding is another.
Search, content processing, and analytics work well in certain circumstances; for example, trimmed data sets which match the textbook checklists for valid inputs and when key facts are known such as the name and aliases of an entity.
Today’s systems — no matter what the marketers say — have been designed to work within some constraints. Marketers and fear mongers don’t have to cope with computational realities.
Stephen E Arnold, June 7, 2013
Yandex: No Query Is an Island
June 7, 2013
Well, almost. If a child is the victim of a snake bite, the doctor wants to query a system and get specific information to save the victim. I am not sure if in some search situations an offer for a vacation trip to Belize is a plus.
Nevertheless, Yandex, the Google nemesis in certain European countries and one of my go to resources, is now offering an “island” service. Here’s the explanation:
Yandex’s new search results page consists of interactive blocks — islands. These blocks are the first step to the user’s search goal and can be anything from factual information to purchase buttons or order forms. Yandex Islands give website owners a chance to directly connect with their visitors, while web users can instantly see and choose the best and most relevant solution to their problem.
Give the service a spin.
Stephen E Arnold, June 7, 2013
Sponsored by Xenky, the portal to ArnoldIT.com
Microsoft Misusing Their Own Mountain of Searchable Data
June 7, 2013
Microsoft is sitting on a search goldmine and people are just starting to see it. Whenever you Skype, have you thought about the data you are releasing into the world? Probably not. But Skype’s owners have, as we discovered in a fascinating The H Security article, “Skype With Care—Microsoft is Reading Everything you Write.”
According to the story:
A spokesman for the company confirmed that it scans messages to filter out spam and phishing websites. This explanation does not appear to fit the facts, however. Spam and phishing sites are not usually found on HTTPS pages. By contrast, Skype leaves the more commonly affected HTTP URLs, containing no information on ownership, untouched. Skype also sends head requests which merely fetches administrative information relating to the server. To check a site for spam or phishing, Skype would need to examine its content.
Honestly, this should not come as a shock to anyone. Frankly, those interested in search should be paying close attention. They should be asking: will Microsoft’s search system be able to index the content and provide relevant results in a timely, accurate manner? We don’t know, but if Yahoo!’s recent collapsed partnership with Microsoft is any indication, the company probably isn’t putting that Skype data to good use.
Patrick Roland, June 07, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
HP and SAP Spat Could Be Trouble for Stocks
June 7, 2013
Up until now the world of big data analytics has been a relatively friendly, yet competitive one. There always seems to be news of new partnerships and business deals among providers. However, as big money gets bigger, big data is getting mouthy. We discovered just how one war of words began in a recent Register article, “HP Tried to Offload Autonomy on SAP, says SAP Co-Chief.”
According to the story:
So, if you’re keeping track: Oracle didn’t want Autonomy. Oracle says it was approached about a sale before HP bought it. Dell didn’t want Autonomy, turning it down before HP bought it, too. Now, SAP didn’t want Autonomy, either, turning it down after HP bought it.
Poor Autonomy, right? Well, not so fast. This is where the story starts getting interesting. The battle over this beleaguered software title has begun to resemble political campaign double speak. Notably, there was the article in the Business Insider, which saw HP stating: “Contrary to reports in the media, HP has no interest is selling Autonomy. During the past year, we’ve received inquiries from SAP about purchasing HP software assets, and time and again we’ve said ‘no.’” Either way, this looks like the makings of an interesting rivalry. However, if squabbling continues, we’d expect stock prices for both companies to waver. In the meantime, why not use LucidWorks or one of the venture backed “been around a while” systems?
Patrick Roland June 07, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Google Needs Predictive Analytic Enlightenment
June 7, 2013
Search is not a perfect thing. While many platforms look like they have the answer all the time, the fact is that human curiosity far outreaches what some search can and can’t do…even the behemoth of search. We discovered exactly how in a recent CNET story, “Google Search Scratches its Head 500 Million Times a Day.”
According to the story:
On a daily basis, 15 percent of queries submitted — 500 million — have never been seen before by Google’s search engine, and that has continued for the nearly 15 years the company has existed, according to John Wiley, the lead designer for Google Search.
Okay, that’s understandable and completely shocking from a volume perspective. But, hey, here’s a novel idea, Google, how about predicting what users want for that 15 percent? Sure, it may be less relevant than selling ads, as Freakanomics recently noted. Meanwhile, tons of smaller companies are perfecting predictive analytics, which Google apparently hasn’t. For those looking to get a handle on predictive analytics (*Cough* Google) we recently found a terrific guide which lays out the ins and outs of analytics for those looking for enlightenment.
Patrick Roland, June 07, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
LucidWorks Continues Training through Webinars
June 7, 2013
SearchHub is one way that LucidWorks keeps in touch with the open source developer community, particularly those concerned with Apache Lucene and Solr. In addition to providing videos, podcasts, and other reference materials, LucidWorks also posts upcoming webinars and other training opportunities. Check out the latest in the entry, “Webinar: Solr 4, the NoSQL Search Server.”
The webinar will cover:
“The long awaited Solr 4 release brings a large amount of new functionality that blurs the line between search engines and NoSQL databases. Now you can have your cake and search it too with Atomic updates, Versioning and Optimistic Concurrency, Durability, and Real-time Get! Learn about new Solr NoSQL features and implementation details of how the distributed indexing of Solr Cloud was designed from the ground up to accommodate them.”
LucidWorks continues to invest in the open source community through such training and support opportunities. LucidWorks as a company is known for their support and services that surround their value-added enterprise search and Big Data solutions. But LucidWorks is also committed to the foundation of their success – the open source community and innovation and agility it brings.
Emily Rae Aldridge, June 7, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search