Hadoop Heats Up with New Startups

May 8, 2013

While there is some controversy over whether Hadoop is the only necessary tool to mine opportunities from big data, Hadoop and insights from big data seem to be synonymous according to Datamation’s recent article. They give us the rundown on “Seven Hot Hadoop Startups that Will Tame Big Data.”

According to this article, the current Hadoop ecosytem market is worth around $77 million. With growth, the value is projected to be at $813 million by 2016. The article notes that Hadoop has not been proven as completely effective in the enterprise world. Queries are still a weak point.

The article discusses seven startups that intend on seeing Hadoop through into maturity like Alpine Data Labs. The following excerpt explains why they are on this list:

“According to Alpine Data, part of the problem is that it’s much too difficult to get real insights out of Hadoop and other parallel platforms. Most companies don’t know what to do with massive datasets, and few have gotten any further with Hadoop than batch processing and basic querying. Alpine Data set out to simplify machine-learning methods and make them available on petabyte-scale datasets. Their tools make these methods available in a lightweight web application with a code-free, drag-and-drop interface.”

With the amount of attention on Hadoop over the years, Hadoop start ups are not a commodity. A list featuring a selection of the new ones to watch is much appreciated. Check out the full and useful list of hot Hadoop start ups.

Megan Feil, May 08, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Inventive Graduate Student Builds Breakthrough Database

April 30, 2013

For some folks, deadlines can lead to innovation. One graduate student’s efforts to speed up his research has resulted in the inspired, high-speed parallel database MapD, we learn from DataInformed‘s encouraging piece, “Fast Database Emerges from MIT Class, GPUs and Student’s Invention.” Todd Mostak’s in-a-pinch breakthrough could soon help others in business as well as academia.

The informative article contains too many specifics to cover here, but I suggest checking it out. It should be fascinating reading for anyone interested in data management. I personally think the use of graphics processors designed for gaming is a stroke of genius. Or maybe desperation (the two can be closely related). Reporter Ian B. Murphy tells us:

“While taking a class on databases at MIT, Mostak built a new parallel database, called MapD, that allows him to crunch complex spatial and GIS data in milliseconds, using off-the-shelf gaming graphical processing units (GPU) like a rack of mini supercomputers. Mostak reports performance gains upwards of 70 times faster than CPU-based systems. . . .

“‘I had the realization that this had the potential to be majorly disruptive,’ Mostak said. ‘There have been all these little research pieces about this algorithm or that algorithm on the GPU, but I thought, “Somebody needs to make an end-to-end system.” I was shocked that it really hadn’t been done.'”

Well, sometimes it takes someone from outside a field to see what seems obvious in retrospect. Mostak’s undergraduate experience was in economics, anthropology, and math, and he was in Harvard’s Middle Eastern Studies program when he was compelled to develop MapD. A database class at MITgave him the knowledge he needed to build this tool, which he created to help with the tweet-heavy, Arab Spring-related thesis he was working on.

MIT’s Computer Science and Artificial Intelligence Lab has now snapped up the innovator. Though some questioned hiring someone with such a lean computer-science education, Lab director Sam Madden knows that Mostak’s unconventional background only means he has a unique point of view. The nascent computer scientist has already shown he has the talent to make it in this field.

Though Mostak says he still has work ahead to perfect his system, he does plan to share MapD as an open source project in the near future. Is he concerned about opening his work to the public? Nope; he states:

“If worse comes to worst, and somebody steals the idea, or nobody likes it, then I have a million other things I want to do too, in my head. I don’t think you can be scared. Life is too short.”

That it is. I suspect we will be hearing more from this creative thinker in the years to come.

Cynthia Murrell, April 30, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Silo Syndrome Claims the Sky Is Falling

April 18, 2013

Organizations in the financial services, healthcare, technology, e-business and government industries are at an increased risk for the newly diagnosed “Silo Syndrome”, according to the article “Thousands of Companies Diagnosed with Dreaded ‘Silo Syndrome’” published by PR Newswire.

Apparently, the symptoms of corporate “Silo Syndrome” are as follows:

“*An inability to immediately access business information

  • Searching for answers but never really finding them
  • Problems processing terms like “unstructured content”
  • A penchant to unnecessarily flatten relational data
  • Inability to join concepts together in real-time
  • Needlessly accessing multiple systems for ‘what’ and ‘why’ answers”

Big data giant Attivio is championing awareness initiatives for what they claim is an increasingly ubiquitous syndrome, as CTO Sid Probstein stars in his very own PSA-style video. Attivio has also created a “Six Signs of Silo Syndrome” warning sign, which can be printed and displayed anywhere.

While Attivio no doubt holds the cure to “Silo Syndrome”, maybe humans build silos because silos are useful. After all, silos are required by various regulations, and silos simply make sense for certain types of business processes. Sure there is room for improvement, but sometimes silos just make sense.

Samantha Plappert, April 18, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Temis and MarkLogic Collaborate on Big Data Challenges

April 18, 2013

Well, this is quite a surprise. Temis announces, “TEMIS and MarkLogic Strengthen Strategic Alliance.” Semantic content-management firm Temis is partnering with MarkLogic, who boasts of providing the only enterprise NoSQL database in the market, to tackle unstructured data. The press release tells us:

“With new, enhanced integration capabilities, TEMIS’ Luxid® and MarkLogic® Server can now help organizations do more with their content. . . .

“TEMIS’ Luxid® and MarkLogic® Server count many joint customer implementations. Their integration delivers seamless semantic enrichment of data stored in the enterprise NoSQL database with the Luxid® domain-specific and multilingual annotation process. This enables organizations to build powerful Big Data applications, combining content semantics with real-time database agility to make massive volumes of unstructured content easier to exploit.”

Metadata master Temis was Founded in 2000 by some folks with IBM-based text-mining experience under their belts. The company now has offices across Europe and North America. This year, their flagship Luxid Content Enrichment Platform won the Software & Information Industry Association‘sCodie Award for Best Semantic Technology Platform.

With a laser focus on efficient and fruitful databases, MarkLogic is headquartered in Silicon Valley, with offices around the world. The company was founded in 2001, and has been working beyond the relational database since long before “big data” became a buzzword.

Cynthia Murrell, April 18, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Recommind Moves into Healthcare

April 14, 2013

Recommind is embracing the healthcare market. Marketwire shares, “Recommind Will Be First Time Speaker and Sponsor at World Health Care Congress Conference.” With legal conquered, it looks like the company is on to new adventures. We learn from the press release:

“Recommind, a leader in unstructured data management, analysis and governance technology, today announced it will be sponsoring and speaking for the first time at the World Health Care Congress (WHCC) event on April 8-10 at the Gaylord National Harbor in Maryland. Recommind will join the global health care community of business, political, and academic leaders to actively share information and collaborate to improve the overall quality and cost of health delivery in the US and throughout the world.”

The company hosted a speaking session, at which they advised attendees on key analytics issues, like implementing an efficient infrastructure, communicating information back to providers, analytics-informed preventative programs, and sharing improved outcomes. It is good to see the company branching into the spirited medical arena.

Experts at handling unstructured data, Recommind provides search-powered analysis and governance solutions to customers around the world. These tools are built around on their CORE information management platform. Headquartered in San Francisco, the company was formed in 2000.

Cynthia Murrell, April 14, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Understanding JSON

April 8, 2013

The Altova Blog piece “Editing, Converting and Generating JSON” provides a helpful guide to using JSON. The use of JSON as a data transport protocol has been on the rise and so has the debate about the advantages of JSON vs. XML. The debate has been waging on but the author actually sums it up fairly well.

“But when you boil it down, there are simply some cases for which JSON is the best choice, and others where XML makes more sense. While you might need to choose between JSON and XML depending on the development task at hand, you don’t have to choose between code editors – XMLSpy supports both technologies and will even convert between the two.”

Altova has extended its intelligent XML editing features to JSON editor in order to make JSON editing as simple as possible. Users who begin editing JSON in text view will get lots of help along the way from XMLSpy thanks in the form of syntax coloring, bracket matching, source folding, entry helper windows, menus and other helpful tools. A one click option on the XMLSpy convert menu makes converting XML to or from JSON quick and easy. The ability to edit but also convert items directly within the XML editor program is extremely useful. JSON lovers will definitely have something to look forward to.

April Holmes, April 08, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Ingersoll Says the Solution is Search

April 4, 2013

For companies tackling big problems related to large sets of data, Grant Ingersoll has the solution – search. At the recent GigaOm Structure: Data Conference, Ingersoll, CTO of LucidWorks, recommends that organizations take another look at search solutions. GigaOm covers the details in their story, “How Search Can Solve Big Data Problems.”

The article begins:

“There are many solutions for figuring out how to parse large amounts of data, but LucidWorks CTO Grant Ingersoll has a suggestion: use search. At GigaOM’s Structure:Data conference in New York City Thursday, Ingersoll laid out his case for why search is a big part of dealing with databases and indexes. ‘Search should be a critical part of your architecture,’ he told attendees. It is a system building block for any large problem you’re trying to solve that requires a ranked set of results. And it doesn’t have to be just text search, it can be for any type of search, he said.”

Ingersoll goes on to assert that search has changed dramatically quickly. For those organizations that have not updated their search solution in several years, there are more options on the market that are likely to serve their purposes more effectively. LucidWorks, Ingersoll’s company, is a longstanding name in the field, and yet has undergone dramatic changes even in the last few years. If your organization is exploring options for more effective search and Big Data management, LucidWorks is worth a serious look.

Emily Rae Aldridge, April 4, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Advice for Scalable Search from Parse

April 4, 2013

Ah, the excitement of scaling. The ParseBlog gives developers some practical advice in, “Implementing Scalable Search on a NoSQL Backend.” As the makers of the popular cloud platform used by such conspicuous clients as Cisco, Ferrari, and the Food Network, Parse should know what they’re talking about, particularly when it comes to working with their product.

Engineer Brad Kittenbrink emphasizes that simple search algorithms, perfectly good for quickly getting a prototype up and running, can lead to seriously bogged-down performance later. He writes:

“The key to making searches run efficiently is to minimize the number of documents that have to be examined when executing each query by using an index. To do that you need to keep in mind what kinds of queries you want to support when designing how to organize your data. The more structured and limited these queries are, the easier this will be. . . .

“To organize your data model to support efficient searching, you’ll need to know a bit about how our systems are operating behind the abstraction. You’ll need to build your data model in a way that it’s easy for us to build an index for the data you want to be searchable.”

The post notes that Parse has implemented some new features to make searches more efficient, and goes on to give a couple of examples, including some sample code. Launched in 2011, the company is located in San Francisco. And, by the way, they are hiring.

Cynthia Murrell, April 04, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Retire the Label Unstructured Data

March 27, 2013

Grant Ingersoll, CTO of LucidWorks, is sick and tired of the term “unstructured data.” It is really hard to blame him. The term is everywhere these days, and tends to sum up an idea of any data that is hard for a traditional database to capture.

Ingersoll says:

“I think that, in the early days of databases, someone coined ‘unstructured’ as a derogatory term to mean ‘all the stuff a database isn’t good at working on.’ If ‘structured’ is good, then ‘un’-structured must be bad, right? The problem is that working with text is one of the defining computational challenges of our time. We need our best and brightest working on it; and not just so we can better target ads to consumers. It’s too full of promise to describe with such a diminutive word as ‘unstructured.’ Numerical data? Child’s play! Text? Now there’s a real challenge.”

Ingersoll goes on to say that “rich data” is his new phrase of choice. If unstructured is meant to be negative, and text is some of the most challenging, but most rewarding content we have available, then rich may very well fit the bill. Regardless, end users are looking for solutions to tackle their individual content storage and retrieval problems. LucidWorks, the company that Ingersoll helped found, does just that. So unstructured or rich, LucidWorks has the solution to meet your data needs.

Emily Rae Aldridge, March 27, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Loom Dataset Management for Hadoop Released by Revelytix

March 27, 2013

In the article, Revelytix Launches Loom Dataset Management for Hadoop from Data Center Knowledge, the early access availability of Loom Dataset for Hadoop is celebrated. Revelytix, big data software and resource provider, offers tools to enable data scientists to work with Hadoop. Loom is the product of years of design and innovation for the Department of Defense, pharmaceutical companies, financial services and leading intelligence agencies in the United States. Loom’s capabilities are explained in the article as follows,

“Loom makes it easy for data scientists and IT to build more analytics faster with easy-to-use interfaces that simplify getting the right data for the job quickly and managing datasets efficiently over time with proper tracking and data auditing,” said Revelytix CEO Mike Lang.  Loom includes dataset lineage so you know where a dataset came from, Active Scan to dynamically profile datasets, Lab Bench for finding, transforming, and analyzing data in Hadoop and Hive; data suitability, and open APIs.”

As this excerpt reveals, the article reads more like a company newsletter of Revelytix than anything else. It goes on to state that Revelytix also recently announced that it would continue its work for the Department of Defense in 2013, broadening the implementation of the data management capabilities already in place.

Chelsea Kerwin, March 27, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta