Security, Data Analytics Make List of Predicted Trends in 2015

January 9, 2015

The article on ZyLab titled Looking Ahead to 2015 sums up the latest areas of focus at the end of one year and the beginning of the next. Obviously security is at the top of the list. According to the article, incidents of breaches in security grew 43% in 2014. We assume Sony would be the first to agree that security is of the utmost importance to most companies. The article goes on to predict audio data being increasingly important as evidence,

“Audio evidence brings many challenges. For example, the review of audio evidence can be more labor intensive than other types of electronically stored information because of the need to listen not only to the words but also take into consideration tone, expression and other subtle nuances of speech and intonation…As a result, the cost of reviewing audio evidence can quickly become prohibitive and with only a proportional of the data relevant in most cases.”

The article also briefly discusses various data sources, data analytics and information governance in their prediction of the trends for 2015. The article makes a point of focusing on the growth of data and types of data sources, which will hopefully coincide with an improved ability to discover the sort of insights that companies desire.

Chelsea Kerwin, January 09, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Inside the Creative Commons Dataset from Yahoo and Flickr

January 5, 2015

These are not our grandparents’ photo albums. With today’s technology, photos and videos are created and shared at a truly astounding pace. Much of that circulation occurs on Flickr, who teamed up with Yahoo to create a cache of nearly 100 million photos and almost 800,000 videos with creative commons licenses for us all to share. Code.flickr.com gives us the details in “The Ins and Outs of the Yahoo Flickr Creative Commons 100 Million Dataset.” Researchers Bart Thomée and David A. Shamma report:

“To understand more about the visual content of the photos in the dataset, the Flickr Vision team used a deep-learning approach to find the presence of visual concepts, such as people, animals, objects, events, architecture, and scenery across a large sample of the corpus. There’s a diverse collection of visual concepts present in the photos and videos, ranging from indoor to outdoor images, faces to food, nature to automobiles.”

The article goes on to explore the frequency of certain tags, both user-annotated and machine-generated. The machine tags include factors like time, location, and camera used, suggesting rich material for data analysts to play with. The researchers conclude with praise for their team’s project:

“The collection is one of the largest released for academic use, and it’s incredibly varied—not just in terms of the content shown in the photos and videos, but also the locations where they were taken, the photographers who took them, the tags that were applied, the cameras that were used, etc. The best thing about the dataset is that it is completely free to download by anyone, given that all photos and videos have a Creative Commons license. Whether you are a researcher, a developer, a hobbyist or just plain curious about online photography, the dataset is the best way to study and explore a wide sample of Flickr photos and videos.”

See the article for more details on those tags found within the massive dataset. To download the whole assemblage from Yahoo Labs, click here.

Cynthia Murrell, January 05, 2015

Sponsored by ArnoldIT.com, developer of Augmentext

A SASsy Hadoop Data Connection

January 2, 2015

It has been a while since we posted an article that highlights Hadoop’s capabilities and benefits. The SAS Data Management blog talks about how data sources are increasing and Hadoop can help companies organize and use their data: “The Snap, Crackle, And Pop Of Data Management On Hadoop.”

SAS is a leading provider of data management solutions, including an entire line based on the open source Hadoop software. They offer several ways to control data, including the FROM, WITH, and IN options. While the names are simple, they sun up the processes in one world.

The SAS FROM allows users to connect to the Hadoop cluster. It connects to Hadoop using an SAS/ACCESS engine, which collects metadata built in Hadoop and making them available in the data flows. This allows the software to make performance decisions without user intervention.

SAS WITH is more complicated based off its give and take function:

“The SAS WITH story provides transformation capabilities not yet available in Hadoop. UPDATE and DELETE are standard SQL transformations used in a variety of data processing programs. Hive does not yet support these functions, but you can utilize PROC IMSTAT (part of the WITH story) to lift a table or partition into memory and perform these functions in parallel. The table or partition could then be reincorporated into the Hive table, alleviating the need to truncate and reload from an RDBMS data source.”

SAS IN has the most advanced coding capabilities for data management. It allows users to run a program, where they can run eight functions in parallel against Hadoop data tables. They can also use DS2 language to perform difficult transformation of a table in parallel.

SAS’s three new Hadoop interactions allow for better streamlining of data from multiple sources and provides more insight into industry applications.

Whitney Grace, January 02, 2015
Sponsored by ArnoldIT.com, developer of Augmentext

Mastering Data Quality Requires Change

December 31, 2014

Big data means big changes for data management and ensuring its quality. Computer users, especially those ingrained in their ways, have never been keen on changing their habits. Insert trainings and meetings, then you have a general idea of what it takes to install data acceptance. Dylan Jones at SAS’s Data Roundtable wrote an editorial, “Data Quality Mastery Depends On Change Management Essentials.”

Jones writes that data management is still viewed as a strict IT domain and data quality suffers from it. It required change management to make other departments understand about the necessity for the changes.

Change management involves:

• “Ownership and leadership from the top

• Alignment with the overall strategy of the organization

• A clear vision for data quality

• Constant dialogue and consultation”

Jones notes that leaders are difficult to work with when it comes to change implementation, because they do not see what the barriers are. It translates to a company’s failure to adapt and learn. He recommends having an outside consultant, with an objective perspective, help when trying to make big changes.

Jones makes good suggestions, but he lacks any advice on how to feasibly accomplish a task. What he also needs to consider is data quality is constantly changing as new advances are made. Is he aware that some users cannot keep up with the daily changes?

Whitney Grace, December 31, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Data Analysis by Algorithm

December 22, 2014

The folks at Google may have the answer for the dearth of skilled data analysts out there. Unfortunately for our continuing job crisis, that answer does not lie in (human) training programs. Google Research Blog discusses “Automatically Making Sense of Data.” Writers Keven Murphy and David Harper ask:

“What if one could automatically discover human-interpretable trends in data in an unsupervised way, and then summarize these trends in textual and/or visual form? To help make progress in this area, Professor Zoubin Ghahramani and his group at the University of Cambridge received a Google Focused Research Award in support of The Automatic Statistician project, which aims to build an ‘artificial intelligence for data science’.”

Trends in time-series data have thus far provided much fodder for the team’s research. The article details an example involving solar-irradiance levels over time, and discusses modeling the data using Gaussian-based statistical models. Murphy and Harper report on the Cambridge team’s progress:

“Prof Ghahramani’s group has developed an algorithm that can automatically discover a good kernel, by searching through an open-ended space of sums and products of kernels as well as other compositional operations. After model selection and fitting, the Automatic Statistician translates each kernel into a text description describing the main trends in the data in an easy-to-understand form.”

Naturally, the team is going on to work with other kinds of data. We wonder—have they tried it on Google Glass market projections?

There’s a simplified version available for demo at the project’s website, and an expanded version should be available early next year. See the write-up for the technical details.

Cynthia Murrell, December 22, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Hidden Data In Big Data

December 15, 2014

Did you know that there was hidden data in big data? Okay, that makes a little sense given that big data software is designed to find the hidden trends and patterns, but RC Wireless’ “Discovering Big Data Unknowns” article points out that there is even more data left unexplored. Why? Because people are only searching in the known areas. What about the unknown areas?

The article focuses on Katherine Matsumoto of Attensity and how she uses natural language processing to “social listen” in these grey areas. Attensity is a company that specializes in natural language processing analytics to understand the content around unstructured data—big data white noise. Attensity views the Internet as the world’s largest consumer focus group and they help their clients’ consumerism habits. The new Attensity Q platform enables users to identify these patterns in real time with and detect big data unknowns.

“The company’s platform combines sentiment and trend analysis with geospatial information and information on trend influencers, and said its approach of analyzing the conversations around emerging trends enables it to act as an “early warning” system for market shifts.”

The biggest problem Attensity faces is filtering out spam and understanding the data’s context. Finding the context is the main way social data can be harnessed for companies.

Scooping out the white noise for the useful information is a hard job. Can the same technology be applied to online ads to filter out the scams from legitimate ones?

Whitney Grace, December 15, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

OpenText Success Story with South African Distell

December 2, 2014

The article titled Distell Supports Business Growth Through Improved Information Management on OpenText tells the story of the booming business Distell, a South African beverage producer. Since opening in 2000, the company has grown quickly, and the speed of the growth resulted in unstructured data being stored in unconnected silos. Needless to say this was detrimental to the company’s efficiency. The article explains,

“Today, there are over 13 million information assets in the Distell Enterprise Content Management (ECM) platform or repository; with tens of thousands of items being added weekly. Helping make sense of this wealth of corporate intellectual property are OpenText ECM solutions, from archiving to document management and secure file sharing in the cloud. This collaborative, searchable, secure repository enables marketing, sales, operations, production and service functions in one continent to access information from peers across the globe.”

The article seems to convey an OpenText success story, with improved collaboration and efficiency throughout Distell. The company boasts around 30 new employees a month, and ECM’s largest benefit is considered productivity and continuity of services. No word on how this implementation cost, but you can almost hear an OpenText representative asking, “can you put a price on empowering your employees?”

Chelsea Kerwin, December 02, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Announcing Kapow Enterprise 9.3

December 1, 2014

We can’t blame a company for crowing about new features. On its site, Kapow Software announces a new version of its business platform in, “New in Kapow Enterprise 9.3.” The write-up emphasizes:

“Kapow Enterprise 9.3 introduces new capabilities that give organizations greater flexibility, speed and reach in turning Big Data into business insights. These enhancements extend Kapow Enterprise as the leading data integration platform to access, integrate, deliver and explore data from the widest variety of internal and external sources.”

The new version boasts added flexibility and coverage when acquiring data across disparate sources. It also offers enhanced data distribution and exploration; of particular value to many will be the platform’s visual presentation of data through auto-generated graphs and tables, both of which update themselves as users add and remove filters. Kapow has also improved its Kapplets, the feature that lets users easily publish web apps that combine information into easily-digested interactive presentations. See the post for more information, or contact the company to request a demo.

Priding themselves on their products’ flexibility, integration-and-automation firm Kapow serves businesses of all sizes around the world. Headquartered in Palo Alto, California, Kapow was founded in 2005. The promising company was snapped up by process-applications outfit Kofax in 2013. Kofax is also based in Palo Alto, and was founded back in 1991.

Cynthia Murrell, December 01, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Attensity Finds New Data Trends But Is It Different Than Anyone Else?

November 26, 2014

Enterprise Apps Today has an article called “Attensity Boosts Ability To Discover ‘Unknown’ Trends In Data,” discussing how Attensity was updated with new features to detect themes in real-time social data, catch spam, and make it easier to compose/filter queries. Before Attensity’s new software updates, social analytics tools use mentions to measure interest in products. The “mentions” are not the most quantifiable way to see if a product is successful.

The new Attensity Q tracks themes, trends, anomalies, and events around a product in the context of online conversations. This makes it easier to create new vocabularies and brand-unique terms into queries.

” ‘Social analytics has largely been limited up to this point by forming hypotheses and testing them – the hunting and pecking for insights that traditional search requires you to do,” [Senior Project Manager and NLP Strategist Katherine] Matsumoto said. “But there is a growing need for our customers to be presented with findings that they didn’t know to look for. These findings may be within their search topic, adjacent to it or many degrees removed through nested relationships.’ “

Attensity Q has more applications than retail. It can be used for legal departments to detect fraudulent activities and by HR departments to target area for improvement. It could even be used with healthcare patient data to track unusual patterns and offer a better diagnosis.

Rather than bragging about big data’s possibilities, Attensity is describing some practical applications and their uses.

Whitney Grace, November 26, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

An Expert WAND Partnership

November 24, 2014

Data is messy and needs to be kept clean. Data on a large, enterprise scale is a nightmare to neat freaks, because without an organizational hierarchy it would take years to sift through. Wand Inc.’s corporate blog posted some exciting news, “Expert System And WAND Partner For A More Effective Management Of Enterprise Information.” WAND is known throughout big data as the leader in enterprise taxonomies, while Expert Systems is renowned for its semantic technology.

The goal of the partnership is to help enterprise systems make their data more findable, manage better client relationships, and decrease operational risks. While the partnership will affect enterprise systems overall, there are three main factors that will overhaul the enterprise content management process:

1 “Taxonomy selection: WAND offers the biggest library of out-of-the-box taxonomies available on the market today. By selecting one of the available sector specific taxonomies, customers can speed up significantly their implementation time without compromising their specific classification requirements.

2 Automatic Classification based on the selected taxonomy: once the customer chooses the taxonomy, Expert System makes a full set of tools available to define the semantic based categorization rules and the engine that enables the automatic categorization of all the enterprise content.

3 Native integration with the most common document and collaboration systems, including Microsoft SharePoint.”

WAND and Expert Systems’ combined forces will allow enterprise systems to make their data more findable. While the partnership is beneficial, it reads like most big data relationships. What makes it different, however, are the names attached.

Whitney Grace, November 24, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta