A Webinar Adds Value to Data

January 11, 2014

Connotate is offering a webinar called, “Big Data: The Portal To New Value Propositions.” The webinar summary explains what most big data people already know: that with all the new data available, there are new ways to cash in. The summary continues with that people generate data everyday with everything they do on the Internet and that companies have been collecting this information for years. Did you also know that as well as a physical identity that people also have a virtual identity? This is very basic knowledge here. Finally the summary gets to the point about how business value propositions will supply new opportunities, but also leads to possible risks.

After the summary, there is a list of topics that will be covered in the webinar:

· “Review the process of creating big data-based value propositions and illustrate many examples in science and health, finance, publishing and advertising.

· Explore which companies are successful, which are not and why.

· Review the mechanics:  How to use unstructured content and combine it with structured data.

· Focus on data extraction, the “curation” process, the organization of value-based schemas and analytics.

· Analyze the ultimate delivery of value propositions that rest on the unique combination of unique data sets responding to a specific need.”

Big data has been around long enough that there should be less of a focus on how the data is gathered and more on the importance of value propositions. Value propositions demonstrate how the data can yield the results and how they can be used. Data value debates have been going on for awhile, especially on LinkedIn. If Connotate and Outsell know how to turn data into dollars, advertise that instead of repeating big data specs.

Whitney Grace, January 11, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Free Data Mining Book

January 7, 2014

We enjoy telling you about free resources, and here’s another one: Mining of Massive Datasets from Cambridge University Press. You can download the book without charge at the above link, or you can purchase a discounted hardcopy here, if you prefer. The book was developed by Anand Rajaraman and Jeff Ullman for their Stanford course unsurprisingly titled “Web Mining.” The material focuses on working with very large data sets and emphasizes an algorithmic approach.

The description reminds us:

“By agreement with the publisher, you can still download it free from this page. Cambridge Press does, however, retain copyright on the work, and we expect that you will obtain their permission and acknowledge our authorship if you republish parts or all of it. We are sorry to have to mention this point, but we have evidence that other items we have published on the Web have been appropriated and republished under other names. It is easy to detect such misuse, by the way, as you will learn in Chapter 3.”

Nice plug there at the end. If you’re looking for more info on working with monster datasets, check out this resource—the price is right.

Cynthia Murrell, January 07, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Creativity is Key for Data Scientists

January 6, 2014

Hmm, does this defy the easy-big-data narrative? VentureBeat warns us, “The Data Is Not Enough: Creative Data Scientists Make the Difference.” Not only is there a shortage of data scientists in general, we are now told firms would do well to find data scientists graced with creativity. How pesky.

Writer Jordan Novet refers to a recent panel given at VentureBeat’s 2013 DataBeat/Data Science Summit headed by LinkedIn‘s former lead data scientist, Peter Skomoroch.

The article relates:

“Skomoroch envisions a world not too far in the future where balance sheets will track companies’ data assets. But he and other panelists don’t just want more data to analyze. They discussed the importance of creativity as a key trait to look for in people who work with the data. That means relying on proven algorithms might not always cut it.”

Novet shares with us the perspectives of a few panel members. For example, former Kaggle president Jeremy Howard, apparently the creative type himself, described his process:

“Howard likes to just dive into data and start getting hunches about it, without knowing about the industry the data comes from and other context that others would find valuable. ‘That way, there’s no blinkers,’ he said. It might come across as a contrarian view, but Howard thinks his approach is one reason he did well in Kaggle competitions.”

Other panelists quoted in the article include Jawbone‘s VP of data, Monica Rogati and Pete Warden, CEO of Jetpac. See the story for their thoughts.

Cynthia Murrell, January 06, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Getting a Handle on Current and Future Data Compliance

January 3, 2014

Data is changing faster than the technology developed to manage it. Staying on top of new information management techniques is an even greater challenge. Besides doing countless hours of research, Information Management reported on a Web seminar that recently took place called: “Protect, Serve, And Comply: How To Stay Ahead Of Data And Compliance Challenges.” The seminar description takes note of how data is getting bigger and bigger literally every second from social media to documents in fields such as healthcare, finance, and law.

Information management is even more critical as new laws are passed to regulate security, privacy, and the type of information stored. The webinar’s speakers include, Allan D. Grody of Financial InterGroup and Travis Broughton, an enterprise architect. Grody will explain the importance of why identity resolution is priority for organizations and how it will affect different industries.

Broughton then takes the stage for his share:

“Grody will be briefed by Intel Enterprise Architect Travis Broughton, who will show how a service gateway approach can offer identity mapping and enhanced data privacy with little to no impact on existing applications. These cloud gateways connect to on-premise enterprise architectures and address integration challenges for cloud, big data, and mobile applications. He’ll also discuss how data de-identification can remove key systems from audit scope, saving time, money and personnel resources.”

The article does not mention if the webinar will include how-to methods and tips for the attendees. Explanations build a strong beginning for understanding Broughton’s and Grody’s talk, but actual implementation is where it will be put to the test.

Whitney Grace, January 03, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Metalogix Provides Solution for SharePoint and Exchange Data

January 2, 2014

SharePoint and Exchange both contain huge amounts of data, and getting them to work together at maximum efficiency can be a great challenge. ZDNet offers a solution to this scenario in their latest article, “Metalogix Liberates SharePoint and Exchange Data.”

The article begins:

“SharePoint and Exchange are both complex products. While the multiplicity of setting and options makes it possible to adjust their operations to address an organization’s requirements, it also makes it difficult for organizations to pick up and move their data to take advantage of cloud service offerings or to migrate some or all of their data to different on-premise solutions.”

Metalogix says they have this problem covered:

“Metalogix pointed out that their products address the requirements IT decision-makers have to maintain tight management control of their content, security for that content and support a mirrored environment for higher levels of reliability.”

And while Metalogix may or may not be the right product for your enterprise, it is worth following a news source that gives you the latest on SharePoint, as well as tips and tricks. Stephen E. Arnold of ArnoldIT.com, as a longtime leader in search, follows the latest in enterprise search, including SharePoint. Research proves that organizations that get the most of their SharePoint installation are those that customize and continue to tweak. Stay tuned for ways to optimize your installation.

Emily Rae Aldridge, January 2, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Relational Data Stores Versus Hierarchical Databases

January 1, 2014

The article titled Codd’s Relational Vision – Has NoSQL Come Full Circle on opensource connections relates the history of relational databases and applies their lessons to the NoSQL databases so popular today. The article walks through the simplest databases that followed the hierarchical model and then into generalized databases. The article then delves into the work of Edgar F. Codd himself:

“When Codd wrote his paper, he criticized the DBTG databases of the day around the area of how the application interacted with the databases abstractions. Low-level abstractions leaked into user applications. Application logic became dependent on aspects of the databases: Specifically, he cites three criticisms: access dependencies… order dependencies… index dependencies… Codd proposed to get around these limitations by focusing on a specific abstraction: relations…. In short, Codd created a beautiful abstraction that turned out to be reasonable to implement.”

Then came the decision to build horizontally scalable systems, which were incompatible with Codd’s abstraction. The article ultimately suggests that the smart way to approach a database is to base it off of your needs, not off of what is currently trending. There is even a Contact us link for readers who aren’t sure what type of database to select, hierarchical or relational.

Chelsea Kerwin, January 01, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Connotate Still Going Strong

December 16, 2013

We see that Connotate continues to grow. In the News section of their Web site, the company announces, “Connotate Sees Growth and Expands Global Footprint in Q3.” The press release states that, over the last quarter, the Web-data firm has won new clients, renewed existing ones, and formed partnerships that extend their presence around the world.

We learn:

“AT&T, Experian and APCOA are among the new client wins; renewals include Thomson Reuters, Dow Jones and ADP. All of these top companies rely on Connotate to monitor and collect precise and complex data, at scale, to advance business capabilities….

“Connotate continues to grow its partner network both locally and abroad. Connotate added US-headquartered partners Ntrepid and Basis Technology to its partner network, as well as Shenzhen Plan Software, a China-based reseller focused on satisfying demand for big data solutions in Asia.”

The write-up also notes that FetchCheck, the company’s background check solution, processed more than four million transactions in 2013, a substantial number in that field. Based in New Brunswick, New Jersey, Connotate was founded in 2000. The company strives to simplify web-data extraction and monitoring, providing clients with strong business insights through a user-friendly platform.

Cynthia Murrell, December 16, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Digital Reasoning Unleashes Human Analysis to the Cloud

December 8, 2013

Digital Reasoning has built its reputation by providing products that automate the understanding of human communication. One could say they put the humanity in technology. Digital Reasoning has taken its technology to a “higher” level says Broadway World, “Digital Reasoning Debuts Cloud Version Of Its Machine Learning Platform That Analyzes Human Language, Set Sights On Data Scientists.” The Synthesys Machine Learning Platform will be released on the AWS Marketplace and will be available as Synthesys Cloud.

Digital Reasoning hopes that by putting the Synthesys Cloud on AWS Marketplace will allow its clients to process and analyze larger amounts of unstructured data faster and more efficiently. It will also offer a large number of benefits to data scientists:

  • “Rather than spending time on IT tasks such as installing and configuring various hardware and software components, users are able to launch a Synthesys cluster with just a few clicks allowing them to focus on uploading, analyzing and exploring data.
  • Synthesys simplifies the parsing of human language data such as Web content, documents, emails and other electronic communications into semantically rich structures (i.e. entities, facts and relationships) so that data scientists do not have to be subject matter experts in Natural Language Processing (NLP).
  • Synthesys Cloud offers initial support for 3rd party query tools such as Apache Hive, which gives users power and flexibility to explore and visualize Synthesys output.
  • Synthesys Cloud on the AWS Marketplace makes it easy and affordable for any budget to pay-as-they-go by taking advantage of low hourly billing rates and the ability to combine Synthesys with other AWS offerings.”

This company stresses how Synthesys Cloud is an amazing, new tool for data scientists. However, it offers separate reasons as to why it is beneficial for other clients as well. Is the company thinking that business professionals will approach the software differently than data scientists?

Whitney Grace, December 08, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

The Perks of HP Autonomy’s IDOL

November 25, 2013

The promotional article on HP Autonomy titled IDOL, The OS For Human Information touts the abilities of the HP IDOL, (even including a fancy diagram.) The amount of data that HP IDOL can manage seems to be of central importance, but also its versatility in sorting and collecting data from different types of sources, be it social media, cloud, on premise, image, audio, and structured data. The article explains,

“With HP IDOL, you can access, analyze, understand, and act on large amounts of human information from virtually any source… These capabilities make IDOL the OS for human information. With IDOL’s exploratory analytics, you can unlock key ideas, patterns, and concepts in your structured and unstructured data with streamlined processing, tuned for optimal performance. Uncover new opportunities, spot new trends, automate processes, break down silos, mitigate risks, and cut costs to elevate your organizational efficiency and effectiveness by enabling your data to tell you the answers.”

White papers are also available, such as Transitioning to a New Era of Human Information, but first you must register. The article also exclaims over IDOL’s 360-degree viewing platform, ensuring that the information from social media is just as understandable and viewable as anything from a spreadsheet. Unfortunately, this mass-data handling might cause a sluggish system.

Chelsea Kerwin, November 25, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

ZooKeeper for Search Applications

November 20, 2013

Looking for Google-style tech to speed up your search app? The AppScale Blog presents us with an affordable option in, “Emulating Google Megastore Using Open Source Technologies.” The article tells us why Apache’s ZooKeeper is even better than Google’s Bigtable (links in the quote are PDFs.):

“The BigTable model is not enough to fully emulate the Google App Engine Datastore API, as it is based on Megastore, which provides the added benefit of transactions on partitioned data. For this AppScale uses ZooKeeper, the open source implementation of Google’s Chubby. ZooKeeper provides a locking API using a variant of the Paxos algorithm.

“To emulate Megastore with open source software, AppScale automatically sets up a datastore for applications to use and provides the mappings from the Google App Engine Datastore API to the Cassandra and ZooKeeper APIs. With both ZooKeeper and Cassandra, whether its a one node, or an eight node deployment, AppScale will create the configuration files, and start the correct processes on each node. Optionally, the AppScalefile (the AppScale configuration file) can dictate the amount of replication the datastore does. This also makes AppScale a great tool to use to automatically set up a Cassandra or ZooKeeper cluster.”

The write-up goes on to address data layout in Cassandra, query types, and ZooKeeper locks. At the bottom are several helpful links for further investigation. Oh, and a brief, unexplained, lukewarm beer review that is apparently part 16 in a series. It is good to have diverse interests.

Cynthia Murrell, November 20, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta