May 22, 2013
After the data is extracted much can be done with it:
“Extracted results can be saved to csv, Excel(xls), SQLite, Access, SQL Server, MySQL, PostgreSQL, and can specify the database fields’ types and attributes(eg, UNIQUE can avoid duplication of the extracted data). According to the setting, program can build, rebuild or load the database structure, and save the data to an existing database. Professional edition support incremental extraction, clear extraction and schedule extraction.”
FMiner Pro is available for a free fifteen-day trial to see how well it can perform. After viewing the specs, FMiner Pro is worth a shot. It can probably save coders hours by not having to write scripts and organizing Web content is a tedious job no one likes to do. Having a program to do it is much more preferable.
Whitney Grace, May 22, 2013
May 13, 2013
TechCrunch ran a story on a new enterprise file sharing tool, Docurated, which launched at Disrupt NY during Startup Battlefield. “Docurated is an Enterprise Service to Search and Collect the Data You Need From Your Files” tells us that this technology moves beyond the file and folder metaphor and focuses on searching for the documents needed and collecting them.
This new enterprise search tool is poised to compete with the likes of Sharepoint and Autonomy in addition to Google Drive in a way. Interestingly, they have integrative capabilities with Dropbox, another potential competitor. A notable difference that the article points to is that Docurated only crawls content to make it searchable but does not actually host any files.
We looked a little further into the technology on their website and learned the following about their positioning:
“While storage boxes in the cloud have created the ability to amass more files, we still have to find and consume what we need when we need to tell our story. Docurated is your go-to destination for all of your content. No more files or folders. It turns all your documents into useable materials for your content dashboards, presentations, meetings, pitches, etc. in PowerPoint or PDF formats. Docurated provides you with the ability to turn every one of your documents into individual pages that are then presented to you based on relevance to your topic search…”
The branding and utmost focus on the user experience signal that Docurated is looking to make a name for itself through bringing the consumerization of enterprise search around to home plate. We will be on the look out to follow how distruptive this technology turns out to be; Coca-Cola and Netflix are both using it already.
Megan Feil, May 13, 2013
May 8, 2013
While there is some controversy over whether Hadoop is the only necessary tool to mine opportunities from big data, Hadoop and insights from big data seem to be synonymous according to Datamation’s recent article. They give us the rundown on “Seven Hot Hadoop Startups that Will Tame Big Data.”
According to this article, the current Hadoop ecosytem market is worth around $77 million. With growth, the value is projected to be at $813 million by 2016. The article notes that Hadoop has not been proven as completely effective in the enterprise world. Queries are still a weak point.
The article discusses seven startups that intend on seeing Hadoop through into maturity like Alpine Data Labs. The following excerpt explains why they are on this list:
“According to Alpine Data, part of the problem is that it’s much too difficult to get real insights out of Hadoop and other parallel platforms. Most companies don’t know what to do with massive datasets, and few have gotten any further with Hadoop than batch processing and basic querying. Alpine Data set out to simplify machine-learning methods and make them available on petabyte-scale datasets. Their tools make these methods available in a lightweight web application with a code-free, drag-and-drop interface.”
With the amount of attention on Hadoop over the years, Hadoop start ups are not a commodity. A list featuring a selection of the new ones to watch is much appreciated. Check out the full and useful list of hot Hadoop start ups.
Megan Feil, May 08, 2013
May 7, 2013
The Sunlight Foundation and Media Standards Trust have collaborated on a joint project to address plagiarism in the media. Their creation? A web tool and browser extension, Churnalism US. We took a look at The Sunlight Foundation’s recent article on it: “Churnalism: Discover When News Copies from Other Sources.”
The browser extension will be available for Chrome, Internet Explorer and Firefox (full approval pending). The process works by enabling Churnalism to extract article text from a whitelist of common news sites. When it finds a match, it lets you know when the text you are reading might have been copied from another source. The capabilities of this extension are driven by open-source text analysis technology.
Curious about how this would look? We were too, but the luckily this article dove into those details:
“For some anecdotal evidence from my experience using Churnalism, I’ve found a number of instances of articles about science topics relying heavily on press releases and study summaries. For example, take this piece on the BBC website about epilepsy and migraines. Churnalism found a significant portion of the text came from this press release in EurekaAlert! and let me know with a ribbon notification on the top of the page. By tapping the Show Me button on the notification, Churnalism overlays a side-by-side display of the article and the possible match with copied text highlighted for easy comparison.”
With the need for information to be delivered in real-time and the proliferation of sources available to an ever expanding and niche-oriented audience, it is no wonder that there are enough “churnalists” to warrant this browser extension. Since we curate, we must be churnalists too.
Megan Feil, May 07, 2013
May 2, 2013
If you want to do math in Hadoop, this information on Oxdata/h2o from GitHub is for you. Apache Hadoop, the software library designed for the processing of large sets of data is run by H20 to do math over BigData. The vision for the introduction involves using the primary execution framework for whatever algorithm is presented. The program also reads and writes from and to HDFS, S3, NoSQL and SQL. It is even able to pass and evaluate R-like expressions. The article explains,
“H2O keeps familiar interfaces like R, Excel & JSON so that big data enthusiasts & & experts can explore, munge, model and score datasets using a range of simple to advanced algorithms. Data collection is easy. Decision making is hard. H2O makes it fast and easy to derive insights from your data through faster and better predictive modeling. H2O has a vision of online scoring and modeling in a single platform.”
The targeted users are mainly data analysts. H20 hopes to vitalize the community of invested software engineering enthusiasts and provide everyone concerned with the tools to hack data with math and algorithms. If you are interested in being a part of this community, join the Google group h20stream.
Chelsea Kerwin, May 02, 2013
April 30, 2013
Is this a new type of search? Pintrips insists it has something unique in a press release posted at Market Wired, “Pintrips Takes Flight—New Kind of App Takes the Chaos Out of Finding the Best Fare.” The cross-platform tool consolidates information travelers discover across sites, allowing them to share their findings publicly and privately, complete with real-time price and availability updates. That certainly sounds helpful. The write-up tells us:
“Pintrips leverages what’s spread out on the internet, making comparing and finding ideal flights and deals easier than it’s ever been. Instead of time consuming and memory-challenging flight comparisons on multiple windows, the Pintrips platform uniquely allows consumers to compare deals, itineraries and choices in new ways; apples to apples or apples to oranges. For example, compare different date pairs to the same destination to see when better deals are available; compare different destinations such as deals to Montego Bay versus deals to Aruba, Paris versus Barcelona with absolute ease.”
According to recent research, consumers are frustrated with the process of comparing deals across travel sites. Pintrips aims to fill that gap, expecting that frequent travelers will get the most out of their service. Public “pinning boards” enable users to share travel information, bringing the crowd-sourcing impulse to this arena.
The company promises continued refinements to their product, including mobile apps. Currently, the site works best in Chrome, but developers promise Firefox, Safari, and IE functionality down the line. Pintrips was founded in 2011, and launched its beta product in November of last year. The company is headquartered in Sunnyvale, CA.
Cynthia Murrell, April 30, 2013
March 28, 2013
Raspberry Pi system enthusiasts will be excited to read Escape Velocity’s article, Ontopia Runs on Raspberry Pi. Ontopia, a collection of open source tools for building, preserving and developing Topic Maps based applications, reportedly works successfully on the Raspberry Pi, a credit card sized ARM GNU/Linux box based on the Raspberry Pi Foundation’s work. The article demonstrates,
“Using the Raspberry Pi to run the Apache Tomcat server that hosts the Ontopia software, response time is as good or better than I have experienced when hosting the Ontopia software on a cloud-based Linux server at my ISP. Topic maps open quickly in all three applications and navigation from topic to topic within each application is downright snappy…
I am expecting the Pi to be viable development platform and a decent host for low-volume Tomcat-based demonstration applications that Pi?enthusiasts might create.”
The promising findings reported in the article hold the implication that the Pi may be capable of supporting more applications and technologies than previously thought. If you are interested in embedding taxonomy functions into a Raspberry Pi system, Ontopia has an answer. Pi enthusiasm has spread especially among high school and grade schoolers. Fans even meet up in monthly “Raspberry Jams” where like-minded fans discuss the ins and outs of the system.
Chelsea Kerwin, March 28, 2013
March 21, 2013
Language and analytics are starting a new trend by coming together. According to the Destination CRM.com article “New SDL Machine Translation Tool Integrates with Text Analytics” SDL has announced that its machine translation tool can now be integrated to work with text analytics solutions. SDL BeGlobal can translate both structured and unstructured information across more than 80 different language combinations. The information is then analyzed using text analytics solutions. This gives users the ability to access global customer insights as well as important business trends. Jean-Francois Damais, Deputy Managing Director of loyalty global clients solutions at Ispos had the following to say regarding SDL BeGlobal.
“With the growth in global business and the accessibility of online information, we now have a much greater need to access and analyze data from multiple languages. As a company focused on innovation and dedicated to our clients’ successes, we deployed SDL BeGlobal machine translation to further improve our research insights and bring new value to our customers.”
SDL BeGlobal has already caught on with several companies in the text analytics industry and several well known companies have jumped on the bandwagon. Raytheon BBN Technologies currently uses the technology for broadcast and Web content monitoring and Expert Systems uses it for semantic intelligence. Language and analytics are two things that are not normally thought of together but seems like SDL BeGlobal has a good thing going. Only time will tell if the new friendship between language and analytics will last the test of time.
April Holmes, March 21, 2012
March 11, 2013
Real-time tools are used to record information that corresponds directly to actual life. One of the best examples of real-time information is the social networking tool Twitter. CNET wrote an article about Twitter’s time fallacy, “Time Calculator Shows Futility In Trying To Keep Up With Twitter.” The article mentions that in small doses, Twitter is a great tool to keep updated on information, but if can make someone instance trying to follow it all the time. If you feel like life is passing you by if you cannot keep up with tweets, then web developer Koobazaur created the Tweetulator. The Tweetutular calculates how much time you would need to read every single tweet on your feed.
You input the number of people you follow, reading speed, and number of tweets you read a day. For example Twitter co-founder Jack Dorsey would need fourteen hours each day to keep up with the 1330 people he follows.
“The Tweetulator results aren’t really that surprising, but it does manage to put Twitter time into perspective. Let’s just say that if I miss a few tweets here and there, I’m not going to feel bad about it.”
Let us say there is more to life than Twitter and time can be better spent developing new enterprise search strategies.
Whitney Grace, March 11, 2013
February 8, 2013
OpenNebula is in the business of infrastructure management, but is seeking to differentiate itself from the pack and hasten enterprise adoption. The full story is provided by GigaOm in its piece, “OpenNebula Open-sources Service Management Layer with Enterprise in Mind.”
The article begins:
“OpenNebula, the European answer to the likes of Eucalyptus and OpenStack that counts CERN and China Mobile among its customers, is moving to differentiate itself from competitors by freely releasing OpenNebulaApps, a suite of cloud application management tools that sit on top of its traditional infrastructure management toolkit. The OpenNebulaApps tools were previously available only to OpenNebulaPro customers but, according to project director Ignacio Llorente, OpenNebula realized there was more value in opening them up.”
OpenNebula is trying to build on their open source base and customize on the boom of cloud apps. They may be able to make a good go of it. But, on the other hand, while they may succeed in their traditional role of infrastructure management, it may be best to leave enterprise to the experts. One we would recommend taking a long hard look at is LucidWorks. LucidWorks can offer the ongoing trust of the industry and a reliance on the most trusted names in open source, Lucene and Solr.
Emily Rae Aldridge, February 8, 2013