December 23, 2013
The article on BBC News Technology titled Bots Now ‘Account for 61% of Web Traffic’ expands on the data from a recent Incapsula study that found humans might only account for a shrinking minority of internet traffic. Last years figure was more like fifty/fifty, but this is not as scary as it might sound since most of the ‘bots’ causing this traffic are tools for search engines indexing website content. There are also other ‘good bots’ like those used by analytics companies rating website performances and other such tasks. The article describes some reservations about the numbers, according to Dr. Ian Brown of the Oxford University Cyber Security Centre:
“There will also be some unavoidable fuzziness in their data, given that they are trying to measure malicious website visits where by definition the visitors are trying to disguise their origin.” Despite the overall growth in bot activity, the firm said that many of the traditional malicious uses of the tools had become less common. It said there had been a 75% drop in the frequency spam links were being automatically posted.”
Part of the explanation for this drop is credited to Google’s vigilance over the last year in stamping out this practice. More good news, Incapsula also reported a 10% drop in hacking activities such as stealing credit cards and hijacking sites (grouped together under the term tool bot activities).
Chelsea Kerwin, December 23, 2013
December 22, 2013
Love talking about Big Data? I recommend doing a bit of reading. I found “What I Learned from 2 Years of Data Sciencing” refreshing. Quotes I noted were:
- With reference to Big Data projects where the author worked: “None of these projects gained traction within the company and became abandoned.”
- With reference to the work required: “Much of the efforts spent for those projects were in getting the right data into the right shape.”
- “Little did I know that we’ll be cleaning and shaping data for most of my second year at uSwitch.”
- “In practice, I was just cleaning and shaping data.”
- “Figuring out the right work to do is one of the most difficult tasks for a data science team. It doesn’t help with the fact that the data science role is so vague.”
- “Figuring out where to devote our time and effort is not as easy as it sounds.”
- “Unless someone or something can act on the data, results can only satisfy intellectual curiosity. A business can’t survive on funding people to carry out academic studies forever.”
- “If cleaning vast amount of data, being clueless as to what to do, and debating with colleagues sound like a challenge that you want to take on, I know a company in London that’s looking for a data scientist!”
Is there a message about the nuts and bolts of data? Is analytics repeating the sins of the first enterprise search vendors? It is so much easier to sell sizzle than focus on the basics like figuring out what’s important and getting valid data. Let’s just take the easy path seems to be one risk for analytics cheerleaders.
Stephen E Arnold, December 22, 2013
December 21, 2013
If you need a business intelligence solution, apparently Attivio is the one stop shop to go. Attivio has formed two strategic partnerships. The Providence Journal announced that “Actian And Attivio OEM Agreement Accelerates Big Data Business Value By Integrating Big Content.” Actian, a big data analytics company, has an OEM agreement with Attivio to use its Active Intelligence Engine (AIE) to ramp their data analytics solution. AIE completes Actian’s goal to deliver analytics on all types of data from social media to surveys to research documents.
The article states:
” ‘Big Content has become a vital piece in the Big Data puzzle,’ said David Schubmehl, Research Director, IDC. ‘The majority of enterprise information created today is human-generated, but legacy systems have traditionally required processing structured data and unstructured content separately. The addition of Attivio AIE to Actian ParAccel provides an extremely cost-effective option that delivers impressive performance and value.’ “
Panorama announced on its official Web site that, “Panorama And Attivio Announce BI Technology Alliance Partnership.” The AIE will be combined with Panorama’s software to improve the business value of content and big data. Panorama’s BI solution will use the AIE to streamline enterprise decision-making processes by eliminating the need to switch between applications to access data. This will speed up business productivity and improve data access.
The article explains:
“ ‘One of the goals of collaborative BI is to connect data, insights and people within the organization,’ said Sid Probstein, CTO at Attivio. ‘The partnership with Panorama achieves this because it gives customers seamless and intuitive discovery of information from sources as varied as corporate BI to semi-structured data and unstructured content.’”
Attivio is a tool used to improve big data projects to enhance usage of data. The company’s strategy to be a base for other solutions to be built on is similar to what Fulcrum Technologies did in 1985.
Whitney Grace, December 21, 2013
December 20, 2013
Attensity is a name that comes to mind when organizations need to track social analytics for customer relationship management. The company has not been receiving positive PR in the past year, but when we recently visited Attensity’s management Web page. We noticed that the page had a few new faces with impressive resumes. Will these new board members take the company out of the red and place them on the right path?
Let us review each person. Howard Lau joined Attensity in January 2013, says his LinkedIn page, and he has twenty-five years in the business software sector. He used to be an executive at SAP Labs and SAP Ventures and East Gate Capital. He is now Attensity’s CEO and Chairman. Lau is a venture capitalist and has turned a profit four times the investor’s original investment. He is knowledgeable and has the right experience to turn Attensity around. He checks out well.
Thomas Dreikauss is the general manager of Attensity GmbH in Europe and has the large responsibility of running business development across Western Europe. He has worked in sales management and marketing enterprise software for over twenty years. Derikauss has proven he can build strong teams and helping companies expand beyond a small startup. He worked at Inxight Software GmbH, Xerox PARC, and Business Objects. He was probably brought onto the team, because he is noted to help companies grow when times are tough. Another good apple.
The Chief Financial Officer Frank Brown is next:
“Frank brings over 25 years of experience in the technology and finance industries. Prior to Attensity, he has worked with a number of leading companies in the software, communications, and semiconductor industries, at the executive and board level, to chart corporate strategy and manage internal operations. Frank’s experience includes positions with IBM Corporation, Andersen Consulting, Oracle Corporation and Lehman Brothers. Frank’s background also includes a number of years in the investment banking and venture capital industries. His successful track record as a venture capitalist includes investments across the technology and healthcare sectors. As the founder of Amber Ventures, Frank has worked as a senior finance executive in a variety of privately held technology companies guiding their activities in areas such as budgeting, accounting, fundraising and mergers and acquisitions. Frank received his M.B.A. from The Wharton School of the University of Pennsylvania and graduated from the University of California, Berkeley with a B.S. in Decision Sciences, Finance and Accounting.”
Brown has the important duty of bringing in revenue and rerouting financial plans. It is a difficult position to be in, especially if the company is trying to reinvent itself. Experience and openness to new ideas is the route Attensity should rely on as the company tries to get back on track. It will be a long, winding path up the mountain. These three will act as the climbing poles to keep Attensity from falling.
Whitney Grace, December 20, 2013
December 18, 2013
Interesting—it seems the venerated Thomas Bayes is now with us in database land. BayesDB is being developed, in conjunction with an analysis method called CrossCat, by a team of folks from MIT‘s Probabilistic Computing Project and the Shafto Lab at the University of Louisville.
The project’s page explains:
“BayesDB, a Bayesian database table, lets users query the probable implications of their data as easily as a SQL database lets them query the data itself. Using the built-in Bayesian Query Language (BQL), users with no statistics training can solve basic data science problems, such as detecting predictive relationships between variables, inferring missing values, simulating probable observations, and identifying statistically similar database entries.
BayesDB is suitable for analyzing complex, heterogeneous data tables with up to tens of thousands of rows and hundreds of variables. No preprocessing or parameter adjustment is required, though experts can override BayesDB’s default assumptions when appropriate.
BayesDB’s inferences are based in part on CrossCat, a new, nonparametric Bayesian machine learning method, that automatically estimates the full joint distribution behind arbitrary data tables.”
The database is designed for two types of folks: those with no statistics chops who nonetheless have tabular data to analyze, and those proficient with statistics who have a non-standard problem or who have no time or patience for custom modeling. The team credits CrossCat in part with making BayesDB possible, but also say the BQL language was key to its development.
The description includes examples, a discussion of which types of data and problems the database addresses best, reasons to trust the results, why they named it BayesDB, and more. Check out the page for all the details.
Cynthia Murrell, December 18, 2013
December 16, 2013
The author of “2013: the Year ‘the Stream’ Crested” is focused on tapping into flows of data. Twitter and real time “Big Data” streams are the subtext for the essay. I liked the analysis. In one 2,500 word write up, the severe weaknesses of enterprise and Web search systems are exposed.
The main point of the article is that “the stream”—that is, flows of information and data—is what people want. The flow is of sufficient volume that making sense of it is difficult. Therefore, an opportunity exists for outfits like The Atlantic to provide curation, perspective, and editorial filtering. The write up’s code for this higher-value type of content process is “the stock.”
The article asserts:
This is the strange circumstance that obtained in 2013, given the volume of the stream. Regular Internet users only had three options: 1) be overwhelmed 2) hire a computer to deploy its logic to help sort things 3) get out of the water.
The take away for me is that the article makes clear that search and retrieval just don’t work. Some “new” is needed. Perhaps this frustration with search is the trigger behind the interest in “artificial intelligence” and “machine learning”? Predictive analytics may have a shot at solving the problem of finding and identifying needed information, but from what I have seen, there is a lot of talk about fancy math and little evidence that it works at low cost in a manner that makes sense to the average person. Data scientists are not a dime a dozen. Average folks are.
Will the search and content processing vendors step forward and provide concrete facts that show a particular system can solve a Big Data problem for Everyman and Everywoman? We know Google is shifting to an approach to search that yields revenue. Money, not precision and recall, is increasingly important. The search and content vendors who toss around the word “all” have not been able to deliver unless the content corpus is tightly defined and constrained.
Isn’t it obvious that processing infinite flows and changes to “old” content are likely to cost a lot of money. Google, Bing, and Yandex search are not particularly “good.” Each is becoming a system designed to support other functions. In fact, looking for information that is only five or six years “old” is an exercise in frustration. Where has that document “gone.” What other data are not in the index. The vendors are not talking.
In the enterprise, the problem is almost as hopeless. Vendors invent new words to describe a function that seems to convey high value. Do you remember this catchphrase: “One step to ROI”? How do you think that company performed? The founders were able to sell the company and some of the technology lives on today, but the limitations of the system remain painfully evident.
Search and retrieval is complex, expensive to implement in an effective manner, and stuck in a rut. Giving away a search system seems to reduce costs? But are license fees the major expense? Embracing fancy math seems to deliver high value answers? But are the outputs accurate? Users just assume these systems work.
Kudos to Atlantic for helping to make clear that in today’s data world, something new is needed. Changing the words used to describe such out of favor functions as “editorial policy”, controlled terms, scheduled updates, and the like is more popular than innovation.
Stephen E Arnold, December 16, 2013
December 15, 2013
If you are interested in “artificial intelligence” or “artificial general intelligence”, you will want to read “Creative Blocks: The Very Laws of Physics Imply That Artificial Intelligence Must Be Possible. What’s Holding Us Up?” Artificial General Intelligence is a discipline that seeks to render in a computing device the human brain.
Dr. Deutsch asserts:
I cannot think of any other significant field of knowledge in which the prevailing wisdom, not only in society at large but also among experts, is so beset with entrenched, overlapping, fundamental errors. Yet it has also been one of the most self-confident fields in prophesying that it will soon achieve the ultimate breakthrough.
Adherents of making a machine’s brain work like a human’s are, says Dr. Deutsch:
split the intellectual world into two camps, one insisting that AGI was none the less impossible, and the other that it was imminent. Both were mistaken. The first, initially predominant, camp cited a plethora of reasons ranging from the supernatural to the incoherent. All shared the basic mistake that they did not understand what computational universality implies about the physical world, and about human brains in particular. But it is the other camp’s basic mistake that is responsible for the lack of progress. It was a failure to recognize that what distinguishes human brains from all other physical systems is qualitatively different from all other functionalities, and cannot be specified in the way that all other attributes of computer programs can be. It cannot be programmed by any of the techniques that suffice for writing any other type of program. Nor can it be achieved merely by improving their performance at tasks that they currently do perform, no matter by how much.
One of the examples Dr. Deutsch invokes is IBM’s game show “winning” computer Watson. He explains:
Nowadays, an accelerating stream of marvelous and useful functionalities for computers are coming into use, some of them sooner than had been foreseen even quite recently. But what is neither marvelous nor useful is the argument that often greets these developments, that they are reaching the frontiers of AGI. An especially severe outbreak of this occurred recently when a search engine called Watson, developed by IBM, defeated the best human player of a word-association database-searching game called Jeopardy. ‘Smartest machine on Earth’, the PBS documentary series Nova called it, and characterized its function as ‘mimicking the human thought process with software.’ But that is precisely what it does not do. The thing is, playing Jeopardy — like every one of the computational functionalities at which we rightly marvel today — is firmly among the functionalities that can be specified in the standard, behaviorist way that I discussed above. No Jeopardy answer will ever be published in a journal of new discoveries. The fact that humans perform that task less well by using creativity to generate the underlying guesses is not a sign that the program has near-human cognitive abilities. The exact opposite is true, for the two methods are utterly different from the ground up.
IBM surfaces again with regard to playing chess, a trick IBM demonstrated years ago:
Likewise, when a computer program beats a grandmaster at chess, the two are not using even remotely similar algorithms. The grandmaster can explain why it seemed worth sacrificing the knight for strategic advantage and can write an exciting book on the subject. The program can only prove that the sacrifice does not force a checkmate, and cannot write a book because it has no clue even what the objective of a chess game is. Programming AGI is not the same sort of problem as programming Jeopardy or chess.
After I read Dr. Deutsch’s essay, I refreshed my memory about Dr. Ray Kurzweil’s view. You can find an interesting essay by this now-Googler in “The Real Reasons We Don’t Have AGI Yet.” The key assertions are:
The real reasons we don’t have AGI yet, I believe, have nothing to do with Popperian philosophy, and everything to do with:
- The weakness of current computer hardware (rapidly being remedied via exponential technological growth!)
- The relatively minimal funding allocated to AGI research (which, I agree with Deutsch, should be distinguished from “narrow AI” research on highly purpose-specific AI systems like IBM’s Jeopardy!-playing AI or Google’s self-driving cars).
- The integration bottleneck: the difficulty of integrating multiple complex components together to make a complex dynamical software system, in cases where the behavior of the integrated system depends sensitively on every one of the components.
Dr. Kurzweil concludes:
The difference between Deutsch’s perspective and my own is not a purely abstract matter; it does have practical consequence. If Deutsch’s perspective is correct, the best way for society to work toward AGI would be to give lots of funding to philosophers of mind. If my view is correct, on the other hand, most AGI funding should go to folks designing and building large-scale integrated AGI systems.
These discussions are going to be quite important in 2014. As search systems do more thinking for the human user, disagreements that appear to be theoretical will have a significant impact on what information is displayed for a user.
Do users know that search results are shaped by algorithms that “think” they are smarter than humans? Good question.
Stephen E Arnold, December 15, 2013
December 15, 2013
I know that the search engine optimization folks already are on top of this idea, but for the mere mortals of the “search” world, check out “Voevodsky’s Mathematical Revolution.” Vladimir Voevodsky is a Fields winner and he was thinking about some fresh challenges. He hit upon one: The use of a computer to verify proofs. The write up explains a “new foundation is that the fundamental concepts are much closer to where ordinary mathematicians do their work.”
The comment I noted pertains to mathematical proofs. As you know, creating a proof is, for many, mathematics. However, verifying proofs is tough work. The quote I noted is:
“I can’t see how else it will go,” he said. “I think the process will be first accepted by some small subset, then it will grow, and eventually it will become a really standard thing. The next step is when it will start to be taught at math grad schools, and then the next step is when it will be taught at the undergraduate level. That may take tens of years, I don’t know, but I don’t see what else could happen.”
The consequence of automated methods like Coq is even more interesting:
He also predicts that this will lead to a blossoming of collaboration, pointing out that right now, collaboration requires an enormous trust, because it’s too much work to carefully check your collaborator’s work. With computer verification, the computer does all that for you, so you can collaborate with anyone and know that what they produce is solid. That creates the possibility of mathematicians doing large-scale collaborative projects that have been impractical until now.
Stephen E Arnold, December 15, 2013
December 15, 2013
Writer Mellisa Tolentino assesses the state of big data in, “Big Data Economy: The Promises + Hindrances of BI, Advanced Analytics” at SiliconAngle. Pointing to the field’s expected $50 billion in revenue by 2017, she says the phenomenon has given rise to a “Data Economy.” The article notes that enterprises in a number of industries have been employing big data tech to increase their productivity and efficiency.
However, there are still some wrinkles to be ironed out. One is the cumbersome process of pulling together data models and curating data sources, a real time suck for IT departments. This problem, though, may find resolution in nascent services that will take care of all that for a fee. The biggest issue may be the debate about open source solutions.
The article explains:
“Proponents of the open-source approach argue that it will be able to take advantage of community innovations across all aspects of product development, that it’s easier to get customers especially if they offer fully-functioning software for free. Plus, they say it is easier to get established partners that could easily open up market opportunities.
Unfortunately, the fully open-source approach has some major drawbacks. For example, the open-source community is often not united, making progress slower. This affects the long-term future of the product and revenue; plus, businesses that offer only services are harder to scale. As for the open core approach, though it has the potential to create value differentiation faster than the open source community, experts say it can easily lose its value when the open-source community catches up in terms of functionality.”
Tolentino adds that vendors can find themselves in a reputational bind when considering open source solutions: If they eschew the open core approach, they may be seen as refusing to support the open source community. However, if they do embrace open source solutions, some may accuse them of taking advantage of that community. Striking the balance while doing what works best for one’s company is the challenge.
Cynthia Murrell, December 15, 2013
December 11, 2013
I read about Palantir and its successful funding campaign in “Palantir’s Latest Round Valuing It at $9B Swells to $107.8M in New Funding.” Compared to the funding for ordinary search and content processing companies, Palantir is obviously able to attract investors better than most of the other companies that make sense out of data.
If you run a query for “Palantir” on Beyond Search, you will get links to articles about the company’s previous funding and to a couple of stories about the companies interaction with IBM i2 related to an allegation about Palantir’s business methods.
Image from the Louisiana Lottery.
I find Palantir interesting for three reasons.
First, it is able to generate significant buzz in police and intelligence entities in a number of countries. Based on what I have heard at conferences, the Palantir visualizations knock the socks off highly placed officials who want killer graphics in their personal slide presentations.
Second, the company has been nosing into certain financial markets. The idea is that the Palantir methods will give some of the investment outfits a better way to figure out what’s going up and what’s going down. The visuals are good, I have heard, but the Palantir analytics are perceived, if my sources are accurate, as better than those from companies like IBM SPSS, Digital Reasoning, Recorded Future, and similar analytics firms.
Third, the company may have moved into a new business sector. The firm’s success in fund raising begs the question, “Is Palantir becoming a vehicle to raise more and more cash?”
Palantir is worth monitoring. The visualizations and the math are not really a secret sauce. The magic ingredient at Palantir may be its ability to sell its upside to investors. Is Palantir introducing a new approach to search and content processing? The main business of the company could be raising more and more money.
Stephen E Arnold, December 11, 2013