Stop Typing Linguastat Does It For You
March 22, 2014
Web content is a way for companies to attract attention and keep the organization in relevant social media feeds and search results. It is time consuming to generate content. Linguastat claims it offers a solution combining the power of big data and content.
Linguastat tells users that it will turn “haystacks into gold.” It is an interesting tagline, but not as believable as Linguastat’s software description:
“Built on proprietary natural language and artificial intelligence our cloud-based Content Transformation Platform ™ reads, understands, and transforms the vast amount of Big Data found in the world and automatically publishes unique, insightful, and optimized digital stories…at massive scale…at a fraction of the cost!”
If your company is tired of hiring third-parties or using valuable employee time developing Web content, Linguastat offers a solution. It will annotate and analyze your big data and the software’s AI will generate “optimized digital stories.” It also saves typing time and spares people from inflamed carpal tunnel syndrome.
Without seeing the finished product and the less than appealing “turning haystacks to gold” tagline, it is wise to be skeptical about Linguastat. They might be worth researching, however, and getting a trial run.
Whitney Grace, March 22, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Infobright at the Mobile World Congress Focuses on Big Data and the Internet of Things
March 21, 2014
The article titled 2014 Mobile World Congress Highlights—Musings of an MWC Veteran on Infobright offers some of the sunny spots from this year’s Mobile World Congress. The event has been held in Barcelona for the past 8 years, and this years boasted some 70,000 attendees and keynote speaker Mark Zuckerberg. It was also, as the article describes, Infobright’s first year exhibiting on the trade show floor. The article explores such areas of interest as the Internet of Things, “monetization” and the boosted attendance of mobile commerce vendors. The article states,
“There was a noticeable increase in the presence of mobile commerce vendors. Again, this ranged from transaction processing infrastructure to user experience applications for transactions, making payments, transferring funds, etc. The major credit card vendors’ presence was highly visible in this area. In underserved/under developed parts of the world, mobile platforms create a tremendous opportunity for enabling the movement of money and commerce.”
In answer to the self-imposed question, what does Infobright have to do with MWC, the article exclaims, Big Data, that’s what! The article describes the “avalanche” of data that all of this technology revolves around. Infobright promises that it is just the man for the job of analyzing big data with the flexibility and speed necessary. Infobright offers a solution if you want to query machine data.
Chelsea Kerwin, March 21, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Google Flu Trends: How Algorithms Get Lost
March 15, 2014
Run a query for Google Flu Trends on Google. The results point to the Google Flu Trends Web site at http://bit.ly/1ny9j58. The graphs and charts seem authoritative. I find the colors and legends difficult to figure out, but Google knows best. Or does it?
A spate of stories have appeared in New Scientist, Smithsonian, and Time that pick up the threat that Google Flu Trends does not work particularly well. The Science Magazine podcast presents a quite interesting interview with David Lazar, one of the authors of “The Parable of Google Flu: Traps in Big Data Analysis.”
The point of the Lazar article and the greedy recycling of the analysis is that algorithms can be incorrect. What is interesting is the surprise that creeps into the reports of Google’s infallible system being dead wrong.
For example, Smithsonian Magazine’s “Why Google Flu Trends Can’t Track the Flu (Yet)” states, “The vaunted big data project falls victim to periodic tweaks in Google’s own search algorithms.” The write continues:
A huge proportion of the search terms that correlate with CDC data on flu rates, it turns out, are caused not by people getting the flu, but by a third factor that affects both searching patterns and flu transmission: winter. In fact, the developers of Google Flu Trends reported coming across particular terms—those related to high school basketball, for instance—that were correlated with flu rates over time but clearly had nothing to do with the virus. Over time, Google engineers manually removed many terms that correlate with flu searches but have nothing to do with flu, but their model was clearly still too dependent on non-flu seasonal search trends—part of the reason why Google Flu Trends failed to reflect the 2009 epidemic of H1N1, which happened during summer. Especially in its earlier versions, Google Flu Trends was “part flu detector, part winter detector.”
Oh, oh. Feedback loops, thresholds, human bias—Quite a surprise apparently.
Time Magazine’s “Google’s Flu Project Shows the Failings of Big Data” realizes:
GFT and other big data methods can be useful, but only if they’re paired with what the Science researchers call “small data”—traditional forms of information collection. Put the two together, and you can get an excellent model of the world as it actually is. Of course, if big data is really just one tool of many, not an all-purpose path to omniscience, that would puncture the hype just a bit. You won’t get a SXSW panel with that kind of modesty.
Scientific American’s “Why Big Data Isn’t Necessarily Better Data” points out:
Google itself concluded in a study last October that its algorithm for flu (as well as for its more recently launched Google Dengue Trends) were “susceptible to heightened media coverage” during the 2012-2013 U.S. flu season. “We review the Flu Trends model each year to determine how we can improve—our last update was made in October 2013 in advance of the 2013-2014 flu season,” according to a Google spokesperson. “We welcome feedback on how we can continue to refine Flu Trends to help estimate flu levels.”
The word “hubris” turns up in a number of articles about this “surprising” suggestion that algorithms drift.
Forget Google and its innocuous and possibly ineffectual flu data. The coverage of the problems with the Google Big Data demonstration have significance for those who bet big money that predictive systems can tame big data. For companies licensing Autonomy- or Recommind-type search and retrieval systems, the flap over flu trends makes clear that algorithmic methods require baby sitting; that is, humans have to be involved and that involvement may introduce outputs that wander off track. If you have used a predictive search system, you probably have encountered off center, irrelevant results. The question “Why did the system display this document?” is one indication that predictive search may deliver a load of fresh bagels when you wanted a load of mulch.
For systems that do “pre crime” or predictive analyses related to sensitive matters, uninformed “end users” can accept what a system outputs and take action. This is the modern version of “Ready, Fire, Aim.” Some of these actions are not quite as innocuous as over-estimating flu outbreaks. Uninformed humans without knowledge of context and biases in the data and numerical recipes can find themselves mired in a swamp, not parked at the local Starbuck’s.
And what about Google? The flu analyses illustrate one thing: Google can fool itself in its effort to sell ads. Accuracy is not the point of Google or many other online information retrieval services.
Painful? Well, taking two aspirins won’t cure this particular problem. My suggestion? Come to grips with rigorous data analysis, algorithm behaviors, and old fashioned fact checking. Big Data and fancy graphics are not, by themselves, solutions to the clouds of unknowing that swirl through marketing hyperbole. There is a free lunch if one wants to eat from trash bins.
Stephen E Arnold, March 15, 2014
March 13, 2014
Best-Practices for Big Data Report
March 15, 2014
The article titled Report: Best Practices for Big Data Projects on GCN explores the forty-four page IBM report called Realizing the Promise of Big Data. The report includes a history of big data, and explanations for its different applications in the public and private sectors. The report further breaks down the usage of big data by the federal government and local government. The article provides some tips from the report, such as strong oversight,
“A staff with expertise in the technology, business and policy aspects of the project can help prevent any major surprises and ensure everything goes as planned. The development of key performance indicators is critical to big data projects. Both process and outcome measures are essential to the project’s success. Performance measures are centered on improving efficiency, such as lowering the cost of operations. Outcome measures focus on how the customers perceive the service being delivered.”
As might be clear from the quote, the report focuses on how best to design and implement a project of gaining insight from big data. At this point, most of us are still not sure what big data means, but somehow there are there are best practices for a fuzzy field of government effort.
Chelsea Kerwin, March 15, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
The Future of Big Data in the Classroom and Beyond
March 13, 2014
The Harvard Magazine article Why “Big Data” Is a Big Deal cheers for big data and lexalytics. The lengthy article touches on many of the details of big data. Harvard is noticing a thirst for big data and data analysis in almost every field from government to sociology to social sciences. The article noted that it is troubling that big data is not being shared among these fields, for legitimate reasons like privacy, and less legitimate reasons like vanity among academics. On top of this, businesses now own more big data than academia does, and they certainly aren’t sharing. The article gets into many of the current uses of big data,
“In the public realm, there are all kinds of applications: allocating police resources by predicting where and when crimes are most likely to occur; finding associations between air quality and health; or using genomic analysis to speed the breeding of crops like rice for drought resistance.”
These uses are exciting and innovative. The article also explains that given all of these areas of usefulness, big data must be brought into the “foundational courses for all undergraduates.” Teaching undergrads how to work with big data might solve one of the large pitfalls that the article pinpoints. When you are looking through such huge swaths of data, the possibility for false correlations is magnified. Bringing data analytics into the core of undergraduate studies might help prevent the misuse of data. Overall the article is a celebration of how big data is being used to help real people all over the world.
Chelsea Kerwin, March 13, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Tableau Finds Success Since Going Public Last Year
March 12, 2014
Investment site the Street is very enthused about Tableau Software, which went public less than a year ago. In fact, they go so far as to announce that “Tableau’s Building the ‘Google for Data’.” In this piece, writer Andrea Tse interviews Tableau CEO Christian Chabot. In her introduction, Tse notes that nearly a third of the company’s staff is in R&D—a good sign for future growth. She also sees the direction of Tableau’s research as a wise. The article explains:
“The research and development team has been heavily focused on developing technology that’s free of skillset constraints, utilizable by everyone. This direction has been driven by the broad, corporate cultural shift to employee-centric, online-accessible data analytics, from the more traditional, hierarchical or top-down approach toward data analysis and dissemination.
“Tableau 9 and Tableau 10 that are in the product pipeline and soon-to-be-shipped Tableau 8.2 are designed to highlight ‘storytelling’ or visually striking data presentation.
“Well-positioned to ride the big data wave, Tableau shares, as of Tuesday’s [February 11] intraday high of $95, are now trading over 206% above its initial public offering price of $31 set on May 16.”
In the interview, Chabot shares his company’s research philosophy, touches on some recent large deals, and takes a gander at what’s is ahead. For example, his developers are currently working hard on a user-friendly mobile platform. See the article for details. Founded in 2003 and located in Seattle, Tableau Software grew from a project begun at Stanford University. Their priority is to help ordinary people use data to solve problems quickly and easily.
Cynthia Murrell, March 12, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Ontotext Offers Interesting Services
March 8, 2014
Ontotext delivers very interesting services to their clients. All of their products are associated with semantic technology and utilizing big data to benefit its users. On their Web site, the company describes itself as:
“Ontotext develops a unique portfolio of core semantic technologies. Our RDF engine powers some of the biggest world-renowned media sites. Our text-mining solutions demonstrate unsurpassed accuracy across different domains – from sport news to macro-economic analysis, scientific articles and clinical trial reports. We enable the next generation web of data and we can efficiently extract information from today’s structured web – be it recipes, adverts or anything else.”
It offers services for job extraction, hybrid semantics, and semantic publishing for industries such as life sciences, government, recruitment, libraries, publishing, and media. Ontotext has a range of products to help people harness semantic technology. The most interesting to us is the Semantic Biomedical Tagger that is described as an extraction system that creates semantic annotations in biomedical texts. Ontotext also has the requisite search engine and semantic database. Its product line is fairly robust and we intend to keep an eye on its offerings.
Whitney Grace, March 08, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Company Promises Rich Automated Content
March 7, 2014
Linguastat promises to transform big data and uses the metaphor “turning haystacks into gold.” Its Content Transformation Platform was developed for military intelligence with the goal of generating procedures specific, user-defined content. Since its launch, Linguastat counts ecommerce companies, real estate groups, sports organizations, digital publishers, and others among its client list.
What caught our attention was this bullet point about the Content Transformation Platform:
“Automatically writes optimized and copyrightable content.”
Linguastat states that its platform produces thousands of products and digital stories a day for their clients. They also take note that consumers are more likely to make online purchases when there is rich product content. The content is used to inform the consumer about the product. Its clients are in the market for usable content that comes at a cheap price.
While software is written to be extremely “smart” these days, we have a few doubts about the quality of the platform’s stories. Having never worked with the platform before, we can only go off our own experience with automated stories. Often they lack conversational or readable tone that consumers strive for and they tend to list facts in sentences. Cohesiveness is lost in automation. It is possible Linguastat has come across the magic formula that makes machine written stories digestible. Then again, they did promise to turn haystacks into gold.
Whitney Grace, March 07, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Watson Goes to Africa
February 26, 2014
Is big data the key to boosting Africa’s economic prowess? IBM seems to think so, and it is sending in its AI ambassador Watson to help with the continent’s development challenges. Watson is IBM’s natural language processing system that famously won Jeopardy in 2011. Now, Phys,org announces that “IBM Brings Watson to Africa.” The $100 million initiative is known as Project Lucy, named after the skeleton widely considered the earliest known human ancestor (Australopithecus, to be specific), discovered in Africa in 1974. (I would be remiss if I did not mention that an older skeleton, Ardipithecus, was found in 1994; there is still no consensus on whether this skeleton is really a “human ancestor,” though many scientists believe it is. But I digress.)
The write-up tells us:
“Watson technologies will be deployed from IBM’s new Africa Research laboratory providing researchers with a powerful set of resources to help develop commercially-viable solutions in key areas such as healthcare, education, water and sanitation, human mobility and agriculture.
“To help fuel the cognitive computing market and build an ecosystem around Watson, IBM will also establish a new pan-African Center of Excellence for Data-Driven Development (CEDD) and is recruiting research partners such as universities, development agencies, start-ups and clients in Africa and around the world. By joining the initiative, IBM’s partners will be able to tap into cloud-delivered cognitive intelligence that will be invaluable for solving the continent’s most pressing challenges and creating new business opportunities.”
IBM expects that with the help of its CEDD, Watson will be able to facilitate data collection and analysis on social and economic conditions in Africa, identifying correlations across multiple domains. The first two areas on Watson’s list are healthcare and education, both realms where improvement is sorely needed. The Center will coordinate with IBM’s 12 laboratories around the world and its new Watson business unit. (Wait, Watson now has its own business unit?) See the article for more on this hopeful initiative.
Cynthia Murrell, February 26, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Forecasting the Growth of Global Big Data
February 21, 2014
If you’re in the position to make decisions about how your company is going to handle Business Intelligence and Enterprise Search needs, you may want to have a look at Global Big Data Market 2014-2018, a new market research report offered by ReportLinker. PRNewswire reported on the publication.
The full report presents primary and secondary research conducted by TechNavio’s analysts, who
“forecast the Global Big Data market to grow at a CAGR of 34.17 percent over the period 2013-2018. One of the key factors contributing to this market growth is the need to upgrade business processes and improve productivity. The Global Big Data market has also been witnessing the increase in market consolidation. However, the lack of awareness about the potential of big data could pose a challenge to the growth of this market.”
The report covers the Americas, the EMEA and APAC regions and goes into in-depth analysis of the four key vendors in Big Data: Hewlett-Packard Co.,IBM Corp., Oracle Corp., and Teradata Corp.
A host of other vendors is also covered in the full report, which addresses the key challenges of the global Big Data market and the forces driving developments. My guess is that the emerging market adoption of Hadoop is one of those forces.
Laura Abrahamsen, February 21, 2014
Sponsored by ArnoldIT.com, developer of Augmentext