July 16, 2014
If you are looking for an auto-summarization tool, TechCrunch says “Auto-Summarization Tool TextTeaser Relaunches As Open Source Code.” Joe Balbin is the creator of TextTeaser and he added it to GitHub after experiencing scalability issues in the API. Balbin recoded the program and the process is now faster. Developers have two plan options: one is $12 for ever 1000 articles summarized, while the enterprise plan is $250/month and comes with a dedicated server to store the article source.
“ ‘In this TextTeaser, you can train your own summarizer,’ Balbin explains. ‘You can provide the category and source of the article that will be used to improve the quality of the summaries. In the future, users might also have the ability to provide what keyword is important and what is not.’ ”
TextTeaser is used in reader apps, such as Gist. Balbin hopes to optimize the program for medical, financial, and legal documents.
TextTeaser sounds like it makes reading faster. The code is a valuable tool. We will stay tuned to see how else it is used.
Whitney Grace, July 16, 2014
July 8, 2014
The article titled Hadoop Sector will Have Annual Growth of 58% for 2013-2020 in CloudTimes offers a wild and crazy market size estimate for the company. Hadoop is open source so this is a lot of services revenue. Hadoop’s achievement is based on work in big data analysis, access to big data at high speeds, and the management of unstructured data. Keeping costs low while maintain effectiveness spelled success for Hadoop. The article states,
“The report categorized the Hadoop software market into application software, management software, packaged software and performance monitoring software and found that application software category is leading the global Hadoop software market due to high return in its increasing implementation by developers to build real time applications. Also, Hadoop packaged software provides easier deployment of Hadoop clusters. Thus, Hadoop projects such as MapReduce, Sqoop, Hive and others can be smoothly integrated.”
The article does offer some caution to balance the wildly positive report for Hadoop. Due to holes in qualified staff to fill the company, there has been some slowing of growth especially in small and medium enterprises, who might hesitate to adopt the software. Hadoop is booming with government sectors, manufacturing, BFSI, retail and healthcare, among other areas.
Chelsea Kerwin, July 08, 2014
July 3, 2014
If you are interested in the utility of open source information, you will want to pay particular attention to the disappearing content triggered by the EU’s right to be forgotten. Information is hard to find if the index has been scrubbed. I thought about the “disappearing” of information when I read “Out of Band.” The write up states:
Crowdsourcing and the wealth of networks are terms that are in vogue. What the government generally, and the secret world particularly, refuse to knowledge is that information is a team sport and nature bats last. The government is only as good as its ability to do outreach, and if it relies on lies, nature—reality—will always reveal the truth at some future date.
Interesting point. However, when the most used source of information is filtering information, open source access becomes more important. With a single point of access, the reality becomes what’s findable. Will information access expand. Mr. Steele points out:
For the secret world, only a million-dollar custom-made shim will do, and they won’t notice if the beltway bandit sells them a piece of a beer can claiming it is the custom shim. I cannot overstate the ignorance and inattentiveness of today’s contracting officers and contracting officer technical representatives in the secret world.
In my view, his perspective applies to both commercial indexes and to government information methods. Fascinating. I keep wondering if Google is now the information government.
Stephen E Arnold, July 3, 2014
June 2, 2014
Maybe to the dismay of users, Microsoft winds up being cheaper long term than open source software. When it comes to total cost, Microsoft actually overcomes seemingly cheaper options once all investments in the system are considered. The topic is covered in a popular forum, SlashDot. Visit this thread to read more, “Microsoft Cheaper To Use Than Open Source Software, UK CIO Says.”
The discussion begins:
“Jos Creese, CIO of the Hampshire County Council, told Britain’s ‘Computing’ publication that part of the reason is that most staff are already familiar with Microsoft products and that Microsoft has been flexible and more helpful. ‘Microsoft has been flexible and helpful in the way we apply their products to improve the operation of our frontline services, and this helps to de-risk ongoing cost,’ he told the publication. ‘The point is that the true cost is in the total cost of ownership and exploitation, not just the license cost.’”
So while open source is enticing, it is possible that many organizations enter into open source implementations without considering the cost of customization, security, etc. and all the staffing time that goes with that. And while there may be good reasons to still go your own way with open source, it is best to do the research ahead of time and possibly consult with professionals who can look at the total cost of installation.
Emily Rae Aldridge, June 02, 2014
May 23, 2014
Short honk: Qink.net offers a useful list of freely available SharePoint libraries. You can find the listing at http://bit.ly/1lGc7tM. There is no major subcategory for “information retrieval.” There is a pointer to Apache’s Lucene.net page. After scanning the list, my thought was that search is not a mainstream focus for these freely available components.
Stephen E Arnold, May 23, 2014
May 13, 2014
It is time for people to understand that relational databases were not made to handle big data. There is just too much data jogging around in servers and mainframes and the terabytes run circles around relational database frameworks. It is sort of like a smart fox toying with a dim hunter. It is time that more robust and reliable software was used, like Hadoop. GCN says that there are “5 Ways Agencies Can Use Hadoop.”
Hadoop is an open source programming framework that spreads data across server clusters. It is faster and more inexpensive than proprietary software. The federal government is always searching for ways to slash cuts and if they turn to Hadoop they might save a bit in tech costs.
“It is estimated that half the world’s data will be processed by Hadoop within five years. Hadoop-based solutions are already successfully being used to serve citizens with critical information faster than ever before in areas such as scientific research, law enforcement, defense and intelligence, fraud detection and computer security. This is a step in the right direction, but the framework can be better leveraged.”
The five ways the government can use Hadoop is to store and analyze unstructured and semi-structured data, improve initial discovery and exploration, making all data available for analysis, a staging area for data warehouses and analytic data stores, and it lowers costs for data storage.
So can someone explain why this has not been done yet?
May 7, 2014
Writer Christopher Tozzi opens his Var Guy article, “MapR, Elasticsearch Partner on Open Source Big Data Search,” with a good question: With so many Hadoop distributions out there, what makes one stand out? MapR hopes an integration with Elasticsearch will help them with that. The move brings to MapR, as the companies put it, “a scalable, distributed architecture to quickly perform search and discovery across tremendous amounts of information.” They report that several high-profile clients are already using the integrated platform.
Tozzi concludes with an interesting observation:
“From the channel perspective, the most important part of this story is about the open source Hadoop Big Data world becoming an even more diverse ecosystem where solutions depend on collaboration between a variety of independent parties. Companies such as MapR have been repackaging the core Hadoop code and distributing it in value-added, enterprise-ready form for some time, but Elasticsearch integration into MapR is a sign that Hadoop distributions also need to incorporate other open source Big Data technologies, which they do not build themselves, to maximize usability for the enterprise.”
It will be interesting to see how that need plays out throughout the field. MapR is headquartered in San Jose, California, and was launched in 2009. Formed in 2012, Elasticsearch is based in Amsterdam. Both Hadoop-happy companies maintain offices around the world, and each proudly counts some hefty organizations among their customers.
Cynthia Murrell, May 07, 2014
April 29, 2014
Microsoft is getting its open source on. Ars Technica reports, “Microsoft Open Sources a Big Chunk of .NET.” It seems the tech giant is softening its stance on open source resources; perhaps they now see they have little choice if the company wants to remain relevant. Writer Peter Bright reports:
“At its Build developer conference today [April 3, 2014], Microsoft announced that it was open sourcing a wide array of its .NET libraries and related technologies and creating a group, the .NET Foundation, to oversee the development and stewardship of the open source components.
“Perhaps the highlight of the announcement today was that the company will be releasing its Roslyn compiler stack as open source under the Apache 2.0 license. Roslyn includes a C# and Visual Basic.NET compiler, offering what Microsoft calls a ‘compiler as a service.’”
Included in the .NET Foundation are reps from Microsoft (of course), GitHub, and Xamarin. Xamarin and Microsoft have been collaborating for some time, and the former is contributing some if its own libraries to the Foundation. If Xamarin’s experience is any example, Microsoft really is making it easier to collaborate with them. Bright writes:
“We talked to Xamarin CTO Miguel de Icaza about working with Microsoft and the decision to make these components open source. For a long time, he said that while the engineers at the two companies had a good relationship, the decisions that Microsoft made—such as not allowing certain pieces of code to be used on non-Windows platforms—made things difficult for Xamarin.
“However, that changed late last year…. Last November, the companies announced that they were partnering to in order to make it easier to use Xamarin’s tools to write code that works on both Microsoft and non-Microsoft platforms.”
Ah, cooperation! The article specifies that Microsoft has removed troublesome license restrictions, solicited design feedback from Xamarin, published docs under a Creative Commons license, and furnished Xamarin with its internal .NET test suite. Is this a sign of things to come? Stay tuned to see whether Microsoft continues to play well with others.
Cynthia Murrell, April 29, 2014
April 3, 2014
Elasticsearch is the favored open source search application and many startups have built their own products on top of the platform, increasing competition among the startups. InfoWorld lets us know that the competition is about to get stiffer in the article, “Logstash Steps Up As Splunk’s Latest Challenger.”
Splunk offers many big data solutions, including security, analytics, application management, and cloud services. The article explains that Logstash is part of a components stack also including Kibaba and Elasticsearch. It is used to log data and can be configured to a user’s needs. It is an Apache-licensed open source endeavor and has a lower cost margin (either free or a different pay for support plans). Elasticsearch has commercialized Logstash through its Marvel product.
It does not appear that Logstash is a direct competitor, but the article explains:
“So far, the biggest distinction between Splunk and its competition is how they’re productized. Splunk’s a proprietary item, but with the emphasis on it being a product and not simply a technology stack. The competition still largely consists of open source stacks rather than actual services, but it’s clear the gap between what Splunk offers at a cost and what others offer for free is closing.”
Another new service pressures Lucid Imagination and other search vendors to create a response, which also makes investors inpatient as Elasticsearch surges forward with bigger and better ideas. Search vendors are lost in the middle as they try to be competitive and earn a profit at the same time. Kudos to Elasticsearch and open source applications.
April 2, 2014
OpenCalais is an open source project that creates rich semantic data by using natural language processing and other analytical methods through a Web service interface. It is a simple explanation for a piece of powerful software. OpenCalais was originally part of ClearForest, but Thomson Reuters acquired the project in 2007. Instead of marketing OpenCalais as proprietary software, Reuters allowed it to remain open. OpenCalais has since become valued metadata open source software that is used on blogs to specialized museum collections.
There are many notables who use OpenCalais and a sample can be found on “The List Of OpenCalais Implementations Grows.”
OpenCalais is excited about the new additions to the list:
“Add 10 to the list of innovative sites and services that use OpenCalais to reduce costs, deliver compelling content experiences and mine the social web for insight. See our press release for more details on each. We are thrilled to recognize the following new sites and services that are changing the way we engage with news and the social Web. They join a growing number of others in media, publishing, blogging, and news aggregation who use OpenCalais.”
Among them are The New Republic, Al Jazeera’s English blogging news networks, Slate Magazine’s blogging network, and I*heart* Sea.” Not only do news Web sites use OpenCalais, but news aggregation apps do as well, including, Feedly. DocumentCloud, and OpenPublish. Expect the list to grow even longer and consider OpenCalais for your own metadata solution.