Building Lovely Data Visualizations

June 25, 2014

Data is no longer just facts, figures, and black and white graphs. Data visualizations are becoming an increasingly important way that data (and even Big Data) is demonstrated and communicated. A few data visualization solutions are making big waves, and Visage is one on the rise. It is highlighted in the FastCompany article, “A Tool For Building Beautiful Data Visualizations.”

The article begins:

Visage, a newly launched platform, provides custom templates for graphics. There are myriad tools on the market that do this (for a gander at 30 of them, check out this list), but Visage is the latest, and it’s gaining traction with designers at Mashable, MSNBC, and A&E. That’s due in part to Visage’s offerings, which are designed to be more flexible, and more personalized, than other services.”

More and more companies are working on ways to help organizations decipher and make sense of Big Data. But what good is the information if it cannot be effectively communicated? This is where data visualizations come in – helping to communicate complex data through clean visuals.

Emily Rae Aldridge, June 25, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Connotate Shows Growth And Webdata Browser

June 20, 2014

In February 2014, NJTC TechWire wrote an article on “Connotate Announces 25% YOY Growth In Total Contract Value For 2013.” Connotate has made a name for itself by being a leading provider of Webdata extraction and monitoring solutions. The company’s revenue grew 25% in 2013 and among other positives for Connotate were the release of Connotate 4.0, a new Web site, and new multi-year deal renewals. On top of the record growth, BIIA reports that “Connotate Launches Connotate4,” a Web browser that simplified and streamlines Webdata extraction. Connotate4 will do more than provide users with a custom browser:

? “Inline data transformations within the Agent development process is a powerful new capability that will ease data integration and customization.

? Enhanced change detection with highlighting can be requested during the Agent development process via a simple point-and-click checkbox, enabling highlighted change detection that is easily illustrated at the character, word or phrase level.

? Parallel extraction tasks makes it faster to complete tasks, allowing even more scalability for even larger extractions.

? Build and expand capabilities turn the act of re-using a single Agent for related extraction tasks a one-click event, allowing for faster Agent creation.

? A simplified user interface enabling simplified and faster Agent development.”

Connotate brags that the new browser will give user access to around 95% of Webdata and is adaptable as new technologies are made. Connotate aims to place itself in the next wave of indispensable enterprise tools.

Whitney Grace, June 20, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Rising Startup Tamr Has Big Plans for Data Cleanup

June 13, 2014

An article Gigaom is titled Michael Stonebraker’s New Startup, Tamr, Wants to Help Get Messy Data in Shape. With the help ($16 million) from Google Ventures and New Enterprise Associates, Stonebraker and partner Andy Palmer are working to crack the ongoing problem of data transformation and normalization. The article explains,

“Essentially, the Tamr tool is a data cleanup automation tool. The machine-learning algorithms and software can do the dirty work of organizing messy data sets that would otherwise take a person thousands of hours to do the same, Palmer said. It’s an especially big problem for older companies whose data is often jumbled up in numerous data sources and in need of better organization in order for any data analytic tool to actually work with it.”

Attempting to allow for machines to learn some human-like insight into repetitive cleanup work just might be the trick. Tamr does still require a human in the management seat known as the data steward, someone who will read the results of a projected comparison between two sets of separate data and decide whether it is a good relationship. Tamr has been compared to Trifacta, but Palmer insists that Tamr is preferable for its ability to compare thousands of data sources with a data stewards oversight. He also noted that Trifacta co-founder Joe Hellerstein was a student of Stonebraker’s in a PhD program.

Chelsea Kerwin, June 13, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Palantir Advises More Abstraction for Less Frustration

June 10, 2014

At this year’s Gigaom Structure Data conference, Palantir’s Ari Gesher offered an apt parallel for the data field’s current growing pains: using computers before the dawn of operating systems. Gigaom summarizes his explanation in, “Palantir: Big Data Needs to Get Even More Abstract(ions).” Writer Tom Krazit tells us:

“Gesher took attendees on a bit of a computer history lesson, recalling how computers once required their users to manually reconfigure the machine each time they wanted to run a new program. This took a fair amount of time and effort: ‘if you wanted to use a computer to solve a problem, most of the effort went into organizing the pieces of hardware instead of doing what you wanted to do.’

“Operating systems brought abstraction, or a way to separate the busy work from the higher-level duties assigned to the computer. This is the foundation of modern computing, but it’s not widely used in the practice of data science.

“In other words, the current state of data science is like ‘yak shaving,’ a techie meme for a situation in which a bunch of tedious tasks that appear pointless actually solve a greater problem. ‘We need operating system abstractions for data problems,’ Gesher said.”

An operating system for data analysis? That’s one way to look at it, I suppose. The article invites us to click through to a video of the session, but as of this writing it is not functioning. Perhaps they will heed the request of one commenter and fix it soon.

Based in Palo Alto, California, Palantir focuses on improving the methods their customers use to analyze data. The company was founded in 2004 by some folks from PayPal and from Stanford University. The write-up makes a point of noting that Palantir is “notoriously secretive” and that part(s) of the U.S. government can be found among its clients. I’m not exactly sure, though, how that ties into Gesher’s observations. Does Krazit suspect it is the federal government calling for better organization and a simplified user experience? Now, that would be interesting.

Cynthia Murrell, June 10, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Data Journalism Handbook

June 2, 2014

In the fast moving world of technology, updated resources are especially important. The Data Journalism Handbook is a new one that is worth a second look. Available in a variety of languages, the handbook aims to be a primer for the emerging world of data journalism.

The overview states:

“The Data Journalism Handbook is a free, open source reference book for anyone interested in the emerging field of data journalism. It was born at a 48 hour workshop at MozFest 2011 in London. It subsequently spilled over into an international, collaborative effort involving dozens of data journalism’s leading advocates and best practitioners.”

Freely available online via a Creative Commons license, the handbook is an initiative of the European Journalism Centre. Download your free copy today to see if data journalism is a field in which you can participate.

Sponsored by ArnoldIT.com, developer of Augmentext

Emily Rae Aldridge, June 02, 2014

Centrifuge Says It Offers More Insights

May 29, 2014

According to a press release from Virtual Strategy, Centrifuge Systems-a company that develops big data software-has created four new data connectors within its visual link analysis software. “Centrifuge Expands Their Big Data Discovery Integration Footprint,” explains that with the additional data software users will be able to make better business decisions.

“ ‘Without the ability to connect disparate data – the potential for meaningful insight and actionable business decisions is limited,’ says Stan Dushko, Chief Product Officer at Centrifuge Systems. ‘It’s like driving your car with a blindfold on. We all take the same route to the office every day, but wouldn’t it be nice to know that today there was an accident and we had the option to consider an alternate path.’ ”

The new connectors offer real time access to ANX file structure, JSON, LDAP, and Apache Hadoop with Cloudera Impala. Centrifuge’s entire goal is to add more data points that give users a broader and more detailed perspective of their data. Centrifuge likes to think of itself as the business intelligence tool of the future. Other companies, though, offer similar functions with their software. What makes Centrifuge different from the competition?

Whitney Grace, May 29, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Fusion Problems

May 29, 2014

Brett Slatkin at One Big Fluke makes a provoking point in his blog post: “Data Fusion Has No Error Bounds” about how data analysis can be full of calculating errors. Slatkin relates how he has come across many data fusion issues in his career. Data fusion problems occur when people want to merge two or more data sets without any related sources. There are companies that have tried to rectify data fusion problems, but no matter how they advertise their software, code, or gimmick Slatkin proves that there is always going to be some margin of error. How does he do it? Math.

Slatkin illustrates data fusion with three data sets that have zero to little relation. He outlines all the possible outcomes of each data set, ending with that there is a portion that cannot be measured. He proves that despite all of the careful planning, mapping out the possible outcomes yields a phantom zone. His response to this simple outcome is:

“There are two outcomes in data fusion: you measure so you can calculate the error bars, or you make a wild guess.”

What have we learned from this? Despite all attempts to overcome any errors, data analysis is still error prone. Big data vendors will not like that.

Whitney Grace, May 29, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Interview with Jeff Catlin on the Future of Enterprise Data

May 22, 2014

The interview titled Text Analytics 2014: Jeff Catlin, Lexalytics on Breakthrough Analysis may be overstating its case when it is billed as a breakthrough analysis. Most of the questions cover state-of-the-industry topics and Lexalytics promotion. Catlin offers insight into the world of enterprise data and the future of the industry. For example, when asked about new features for 2014 and the near future, Catlin responded,

“As a company, Lexalytics is tackling both the basic improvements and the new features with a major new release, Sallience 6.0 which will be landing sometime in the second half of the year. The core text processing and grammatic parsing of the content will improve significantly, which will in turn enhance all of our core features of the engine. Additionally, this improved grammatic understanding will allow us to be the key to detecting intention, which is the big new feature in Salience 6.0”

Catlin repeats in several of his answers that the industry is in flux, and that vendors can only scramble to keep up, even going so far as to compare 2013 and 2014 enterprise data to the Berlin Wall. He describes two “fronts”, one involving improving core technology, and the other focused on vertical market prospects.

Chelsea Kerwin, May 22, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Using Real Data to Mislead

May 14, 2014

Viewers of graphs, beware! Data visualization has been around for a very long time, but it has become ubiquitous since the onset of Big Data. Now, the Heap Data Blog warns us to pay closer attention in, “How to Lie with Data Visualization.” Illustrating his explanation with clear examples, writer Ravi Parikh outlines three common ways a graphic can be manipulated to present a picture that actually contradicts the data used to build it. The first is the truncated Y-axis. Parikh writes:

“One of the easiest ways to misrepresent your data is by messing with the y-axis of a bar graph, line graph, or scatter plot. In most cases, the y-axis ranges from 0 to a maximum value that encompasses the range of the data. However, sometimes we change the range to better highlight the differences. Taken to an extreme, this technique can make differences in data seem much larger than they are.”

The example here presents two charts on rising interest rates. On the first, the Y-axis ranges from 3.140% to 3.154% — a narrow range that makes the rise from 2008 to 2012 look quite dramatic. However, on the next chart the rise seems nigh non-existent; this one presents a more relevant span of 0.00% to 3.50% on the Y-axis.

Another method of misrepresentation is to present numbers, particularly revenue, cumulatively instead of from year-to-year or quarter-to-quarter. Parikh notes that Apple’s iPhone sales graph from last September is a prominent example of this tactic.

Finally, one can mislead one’s audience by violating conventions. The real-world example here presents a pie chart in which the slices add up to 193%. The network that created it had to know that cursory viewers would pay more attention to the bright colors than to the numbers. The write-up observes:

“The three slices of the pie don’t add up to 100%. The survey presumably allowed for multiple responses, in which case a bar chart would be more appropriate. Instead, we get the impression that each of the three candidates have about a third of the support, which isn’t the case.”

See the article for more examples, but the upshot is clear. Parikh concludes:

“Be careful when designing visualizations, and be extra careful when interpreting graphs created by others. We’ve covered three common techniques, but it’s just the surface of how people use data visualization to mislead.”

Cynthia Murrell, May 14, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

The Hadoop Elephant Offers A Helping Trunk

May 13, 2014

It is time for people to understand that relational databases were not made to handle big data. There is just too much data jogging around in servers and mainframes and the terabytes run circles around relational database frameworks. It is sort of like a smart fox toying with a dim hunter. It is time that more robust and reliable software was used, like Hadoop. GCN says that there are “5 Ways Agencies Can Use Hadoop.”

Hadoop is an open source programming framework that spreads data across server clusters. It is faster and more inexpensive than proprietary software. The federal government is always searching for ways to slash cuts and if they turn to Hadoop they might save a bit in tech costs.

“It is estimated that half the world’s data will be processed by Hadoop within five years.  Hadoop-based solutions are already successfully being used to serve citizens with critical information faster than ever before in areas such as scientific research, law enforcement, defense and intelligence, fraud detection and computer security. This is a step in the right direction, but the framework can be better leveraged.”

The five ways the government can use Hadoop is to store and analyze unstructured and semi-structured data, improve initial discovery and exploration, making all data available for analysis, a staging area for data warehouses and analytic data stores, and it lowers costs for data storage.

So can someone explain why this has not been done yet?

Whitney Grace, May 13, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta