Big Data and Small Data: Tesla and the Gray Lady

February 15, 2013

I don’t want to let this anecdote slip by without capturing it. The addled goose remains quietly in Harrod’s Creek, Kentucky. I ignore the bleats of public relations professionals who “assure me” that unwanted email to me is not spam. You can track the exploits of this PR outfit yourself at North of Nine Communications.

Nope, a more interesting New York style tussle is underway and it concerns what I call small data. In our pulse pounding world of Big Data with IBM Watson tackling cancer, I find small data interesting.

Here’s the story as I understand it. A big newspaper, not too far from the North of Nine outfit, collected some small data about the performance of an electric car. I don’t know about you, but these electric cars stop running when the batteries are exhausted. No big surprise.

The car maker performed various tests and analyses and presented small data to explain that the big newspaper’s small data were incorrect or maybe just out of round. I don’t know, and I don’t care. If you are curious about the status of the dust up, read “Tesla CEO Reveals Evidence against New York Times’ Damning Review in Blog Post.”

My point is that when two outfits cannot agree on small data which presumably both outfits have scrutinized closely, what confidence should you, gentle reader, have in the outputs of Big Data systems. These are based on methods which most folks, including the addled goose, do not understand. Forget the data’s integrity. Let’s just assume that Big Data works like a bulldozer and smoothes out the imperfections.

Well, data and methods don’t smooth out anything. The choices made and the interpretation make a difference for both small data and Big Data.

My point: If you cannot get the small data right, how can we have confidence that Big Data’s outputs, methods, systems, and processes are right? I can’t and won’t. Spats about small data are amusing but the spats illuminate the cloud of craziness which blankets some interesting activities.

Stephen E Arnold, February 15, 2013

Information Delivery Solutions Maximize Value of Big Data

February 15, 2013

It is no surprise that we are seeing many exciting developments happening on a more specific level in the midst of these larger cultural and technological changes following the rise of big data. Science Daily discusses how a crowdsourcing platform that initially began in the commercial sector can solve a complex biological problem even faster than former, traditional approaches in the article, “Solving Big Data Bottleneck: Scientists Team with Business Innovators to Tackle Research Hurdles.”

Harvard Medical School, Harvard Business School and London Business School have partnered with TopCoder, a crowdsourcing platform with a global community of 450,000 algorithm specialists and software developers, and have discovered that this community is highly adept in solving the kinds of problems typically delegated to post docs.

The article quotes Karim Lakhani, associate professor in the Technology and Operations Management Unit at Harvard Business School:

This study makes us think about greater efficiencies in academic research can be obtained. In a traditional setting, a life scientist who needs large volumes of data analyzed will hire a postdoc to create a solution, and it could take well over a year. We’re showing that in certain instances, existing platforms and communities might solve these problems better, cheaper and faster.

Many organizations in the business sector, in addition to the realm of academics, are searching for more efficient ways to store, organize and process big data in order to maximize the value from it. Information delivery solutions are great tools in enabling organizations to access insights from big data across the entire company.

Megan Feil, February 15, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search.

Solr Unleashed Offered by LucidWorks

February 15, 2013

LucidWorks is a company offering commercial support, consulting, training, and value-added software to the open source Apache Lucene and Solr technologies. LucidWorks not only builds upon trusted open source technologies, it supports open source technology by employing committers. They also offer professional training on the open source components, even for those who are not interested in their LucidWorks Search or LucidWorks Big Data solutions. One such training opportunity is Solr Unleashed.

Read about upcoming classes:

“Having consulted with clients on Lucene and Solr for the better part of a decade, we’ve seen the same mistakes made over and over again: applications built on shaky foundations, stretched to the breaking point. In this two day class, learn from the experts about how to do it right and make sure your apps are rock solid, scalable, and produce relevant results. Also check the course outline.”

Register early for a discount on the two-day class. Opportunities are available stateside, as well as in Europe. Developers are the primary audience for the sessions, but system administrators can benefit as well. For more opportunities and to stay in the loop, contact the LucidWorks University team.

Emily Rae Aldridge, February 15, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

DataFacet Video

February 15, 2013

DataFacet’s stream of news slowed in late 2012. The outfit seems to be quiet; what’s going on over there? While we wait for their next move, check out the interesting video on the DataFacet Web site, which effectively introduces their product. It begins with a good explanation of “taxonomy,” which might be useful to bookmark in case you need to define the term for someone unfamiliar with the field. The video goes on to show someone using parts of the DataFacet system, which gives a much better idea of what it does than any text explanation could. It’s set to a catchy tune, too.

The product description surrounding the video specifies:

DataFacet provides a taxonomy based data model for your enterprise’s unstructured information along with a sophisticated, yet easy to use, set of tools for applying the data model to your content.

It’s an easy three step process:

  1. Choose your foundation taxonomies from the DataFacet library of over 500 topic domains
  2. Customize your taxonomy with DataFacet Taxonomy Manager
  3. Tag your content with DataFacet Taxonomy Server

DataFacet is already available for the following search and content environments:

DataFacet is actually a joint project, built by taxonomists from WAND and Applied Relevance. Based in Denver, Colorado, WAND has been developing structured multi-lingual vocabularies since 1998. Their taxonomies have been put to good use in online search systems, ad-matching engines, B2B directories, product searches, and within enterprise search engines.

Applied Relevance offers automated tagging to help organizations contextualize their unstructured data. They have designed their user interface using cross-platform JavaScript and HTML5, which gives their application the flexibility to run in a browser, be embedded in a Web page, or be hosted in an Adobe Air desktop application.

Cynthia Murrell, February 15, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Murdock Says Wall Street Journal Still Under Hacker Attack

February 15, 2013

Now, isn’t this ironic? TNW reports, “Rupert Murdoch Claims Chinese Hackers Are Still Attacking the Wall Street Journal.” Didn’t Murdoch’s own News Corp. use improper methods to obtain information? I didn’t think Karma usually worked that quickly.

Following revelations that the New York Times had been hacked, the world learned that the WSJ had also been targeted. Now, the paper’s (in)famous owner claims the attacks have not been stopped. Writer John Russell tells us:

“The Australia-born media mogul took to Twitter to reveal that the newspaper was still being targeted by Chinese hackers over the weekend. That’s just days after the WSJ bolstered its network security last week after its computer systems ‘had been infiltrated by Chinese hackers for the apparent purpose of monitoring the newspaper’s China coverage’.

“Murdoch has not provided any further substantiation of his claims.”

These two news outlets, as well as Bloomberg, seem to have been targeted as a result of their coverage of Chinese politics. Though there is yet no evidence to support the theory, security experts suspect that the Chinese government is behind the intrusions. Such charges are nothing new to China, who is also known for its embrace of Internet censorship.

Cynthia Murrell, February 15, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Autonomy Improves its eDiscovery Software

February 15, 2013

HP is on the move, leveraging their Autonomy investment with new features, we learn in the company’s announcement, “HP Autonomy Strengthens eDiscovery Solution with New Information Governance Capabilities.”

The crucial early case assessment (ECA) phase occurs at the onset of a legal procedure, when large volumes of data must be assessed quickly, thoroughly, and carefully. The press release informs us:

“Autonomy has extended its Meaning Based Coding (MBC) capability to its ECA module, further enhancing its in-depth eDiscovery analysis capabilities. Autonomy’s MBC capabilities enable organizations to automate analysis based on the Autonomy Intelligent Data Operating Layer (IDOL), which quickly categorizes data by concepts, ideas and patterns in information. Unlike traditional predictive coding technologies, MBC classifications are carried through to the review and production phase without new processing or indexing. As a result, Autonomy ECA can perform an analysis of the data faster, more accurately and at a lower cost.”

Also new is the software’s integration with HP’s Application Information Optimizer, which automates data migration and retirement. Furthermore, Autonomy has added native discovery functionality to the on-premise version of their archiving solution, Autonomy Consolidated Archive. They say these improvements streamline the eDiscovery process, saving money, time, and frustration.

Autonomy, founded in 1996, offers solutions that use IDOL to tame mind-boggling amounts of unstructured data. The technology grew from research originally performed at Cambridge University, and now serves prominent public and private organizations around the world. HP acquired Autonomy in 2011.

Cynthia Murrell, February 15, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Sinequa France: Update 2013

February 14, 2013

My research team was winnowing our archive of information about European search vendors. Since Martin White’s article for eContent in 2011, a number of changes have swept through the search and content processing sector. Some changes were significant; for example, HP’s stunning acquisition of Autonomy. Others were more modest; for example, the steady progress of such companies as Sinequa and Spotter, among others.

The European technical grip on search is getting stronger. Google is the dominant player in Web search. But in enterprise content processing, some European firms are moving more rapidly than their North American or Pacific Rim counterparts.

image

The Sinequa tag cloud. See http://www.sinequa.com/en/page/solutions/category-1.aspx

One interesting example is Sinequa, based in Paris. The company, like other French technology firms, has a staff of capable engineers and managers. However, unlike some other companies, Sinequa has continued to establish a track record as a company innovating in technology and capturing some important accounts; for example, Siemens, the German industrial powerhouse.

Sinequa’s approach is to emphasize that enterprise search has moved to unified information access. A number of companies make similar claims. Sinequa has established that its technology can deliver the type of one-stop access to structured and unstructured content that almost every vendor claims to deliver. You can get a useful overview of the architecture of the Sinequa platform at http://www.sinequa.com/en/page/product/product.aspx.

A relatively recent addition to the Sinequa.com Web site are case analysis videos. I find case examples extremely useful. The presentation of this type of information in rich media format makes it easier for me to get a sense of the value of the solution a vendor delivers. I found the Mercer video particularly interesting. You can find these testimonials at http://www.sinequa.com/en/page/clients/clients-video.aspx.

The trajectory of European search, content processing, and analytics vendors is difficult to plot in today’s uncertain economic climate. Sinequa warrants a close look for organizations seeking an integrated approach to its content assets. For more information about Sinequa’s current activities, tap into the firm’s blog at http://blog.sinequa.com/

Stephen E Arnold, February 14, 2013

Sponsored by EMRxNow, the information service which tracks automated indexing of electronic medical records

Enterprise Organizations Search for Solutions to Deliver Inisights

February 14, 2013

While ETL technologies were once good enough on their own, the era of big data has made waves for more augmenting technologies. However, Smart Data Collective points out that it is not just big data, but also the need for predictive analytics that has caused the paradigm shift. Their article “Data Integration Ecosystem for Big Data Analytics” defines common terminology related to enterprise software in the world inundated with big data through business contexts.

The author identifies the six sources of the integrated data ecosystem in a typical enterprise organization: sources, big data storage, data discovery platform, enterprise data warehouse, business intelligence portfolio, data analytics portfolio.

We learned the following from the article in regards to what processes integrated data can facilitate with greater ease and efficiency:

While the business intelligence deals with what has happened, business analytics deal with what is expected to happen. The statistical methods and tools that predict the process outputs in the manufacturing industry have been there for several decades, but only recently they are being experimented with the organizational data assets for a potential to do a much broader application of predictive analytics.

This was a useful write up as it sheds light on one of the most important topics for enterprise organizations right now dealing with getting a grip on big data. Organizations are looking for solutions that can deliver enterprise information in real time and across various departments and applications.

Megan Feil, February 14, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search.

Microsoft Moves Closer to Open Source

February 14, 2013

Git support has been added to Microsoft and the IT world is all atwitter. Microsoft has long stood directly opposite open source, but this move begins to bridge the gap. InfoWorld draws attention to the news with their story, “Has Microsoft Finally Embraced Open Source?”

The article begins:

“News broke Wednesday about Microsoft adding support for Git to Visual Studio, both in the client — so that it can be used to work against any Git DVCS (distributed version control system) such as Gitorious or GitHub — and on the server. The upshot is twofold: Those using Microsoft’s proprietary centralized version control have a new escape route, and GitHub has a new competitor.”

Microsoft has embarked on a warming trend toward open source. Git is experiencing popularity amongst the developer community. However, GitHub, a major Git repository, has experienced recent search problems. And while news of Microsoft warming toward open source is definitely good news, it may not make a practical difference to most small and medium enterprises. For those businesses, an industry trusted solution like LucidWorks is probably the best course of action.

Emily Rae Aldridge, February 14, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Change Comes to Attensity

February 14, 2013

Just as the demand for analytics is ascending, Attensity makes a management change. We learn the company recently named J. Kirsten Bay their head honcho in “Attensity Names New President/CEO,” posted at Destination CRM. The press release stresses the new CEO’s considerable credentials:

“Bay brings to Attensity nearly 20 years of strategic process and organizational policy experience derived from the information management, finance, and consumer product industries. She is an expert in advising both the public and private sector on the development of econometric policy models. Most recently, as vice president of commercial business with iSIGHT Partners, Bay provided strategic counsel to Fortune 500 companies on managing intelligence requirements and implementing customer and development programs to integrate intelligence into decision programs.”

The company’s flagship product Attensity Pipeline collects and semantically annotates data from social media and other online sources. From there, it passes to Attensity Analyze for text analytics and customer engagement suggestions.

Headquartered in Palo Alto, California, folks at Attensity pride themselves on the accuracy of their analytic engines and their intuitive reports. Rooted in their development of tools that serve the intelligence community, the company now provides semantic solutions to many Global 2000 companies and government agencies.

Cynthia Murrell, February 14, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta