August 7, 2014
Anyone on the lookout for a free intranet search system? FreewareFiles offers Arch Search Engine 1.7, also known as CSIRO Arch. The software will eat up 22.28MB, and works on both 32-bit and 64-bit systems running Windows 2000 through Windows 7 or MacOS or MacOS X. Here’s part of the product description:
Arch is an open source extension of Apache Nutch (a popular, highly scalable general purpose search engine) for intranet search. Not happy with your corporate search engine? No surprise, very few people are. Arch (finally!) solves this problem. Don’t believe it? Try Arch, blind test evaluation tools are included.
In addition to excellent search quality, Arch has many features critical for corporate environments, such as document level security.
*Excellent search quality: Arch has solved the problem of providing good search results for corporate web sites and intranets!
*Up to date information: Arch is very efficient at updating indexes and this ensures that the search results are up to date and relevant. Unlike most search engines, no complete ‘recrawls’ are done. The indexes can be updated daily, with new pages discovered automatically.
*Multiple web sites: Arch supports easy dynamic inclusion or removal of websites.
They also say the system is easy to install and maintain; uses two indexes so there’s always a working one; and is customizable with either Java or PHP.
Cynthia Murrell, August 07, 2014
July 16, 2014
If you are looking for an auto-summarization tool, TechCrunch says “Auto-Summarization Tool TextTeaser Relaunches As Open Source Code.” Joe Balbin is the creator of TextTeaser and he added it to GitHub after experiencing scalability issues in the API. Balbin recoded the program and the process is now faster. Developers have two plan options: one is $12 for ever 1000 articles summarized, while the enterprise plan is $250/month and comes with a dedicated server to store the article source.
“ ‘In this TextTeaser, you can train your own summarizer,’ Balbin explains. ‘You can provide the category and source of the article that will be used to improve the quality of the summaries. In the future, users might also have the ability to provide what keyword is important and what is not.’ ”
TextTeaser is used in reader apps, such as Gist. Balbin hopes to optimize the program for medical, financial, and legal documents.
TextTeaser sounds like it makes reading faster. The code is a valuable tool. We will stay tuned to see how else it is used.
Whitney Grace, July 16, 2014
July 8, 2014
The article titled Hadoop Sector will Have Annual Growth of 58% for 2013-2020 in CloudTimes offers a wild and crazy market size estimate for the company. Hadoop is open source so this is a lot of services revenue. Hadoop’s achievement is based on work in big data analysis, access to big data at high speeds, and the management of unstructured data. Keeping costs low while maintain effectiveness spelled success for Hadoop. The article states,
“The report categorized the Hadoop software market into application software, management software, packaged software and performance monitoring software and found that application software category is leading the global Hadoop software market due to high return in its increasing implementation by developers to build real time applications. Also, Hadoop packaged software provides easier deployment of Hadoop clusters. Thus, Hadoop projects such as MapReduce, Sqoop, Hive and others can be smoothly integrated.”
The article does offer some caution to balance the wildly positive report for Hadoop. Due to holes in qualified staff to fill the company, there has been some slowing of growth especially in small and medium enterprises, who might hesitate to adopt the software. Hadoop is booming with government sectors, manufacturing, BFSI, retail and healthcare, among other areas.
Chelsea Kerwin, July 08, 2014
July 3, 2014
If you are interested in the utility of open source information, you will want to pay particular attention to the disappearing content triggered by the EU’s right to be forgotten. Information is hard to find if the index has been scrubbed. I thought about the “disappearing” of information when I read “Out of Band.” The write up states:
Crowdsourcing and the wealth of networks are terms that are in vogue. What the government generally, and the secret world particularly, refuse to knowledge is that information is a team sport and nature bats last. The government is only as good as its ability to do outreach, and if it relies on lies, nature—reality—will always reveal the truth at some future date.
Interesting point. However, when the most used source of information is filtering information, open source access becomes more important. With a single point of access, the reality becomes what’s findable. Will information access expand. Mr. Steele points out:
For the secret world, only a million-dollar custom-made shim will do, and they won’t notice if the beltway bandit sells them a piece of a beer can claiming it is the custom shim. I cannot overstate the ignorance and inattentiveness of today’s contracting officers and contracting officer technical representatives in the secret world.
In my view, his perspective applies to both commercial indexes and to government information methods. Fascinating. I keep wondering if Google is now the information government.
Stephen E Arnold, July 3, 2014
June 2, 2014
Maybe to the dismay of users, Microsoft winds up being cheaper long term than open source software. When it comes to total cost, Microsoft actually overcomes seemingly cheaper options once all investments in the system are considered. The topic is covered in a popular forum, SlashDot. Visit this thread to read more, “Microsoft Cheaper To Use Than Open Source Software, UK CIO Says.”
The discussion begins:
“Jos Creese, CIO of the Hampshire County Council, told Britain’s ‘Computing’ publication that part of the reason is that most staff are already familiar with Microsoft products and that Microsoft has been flexible and more helpful. ‘Microsoft has been flexible and helpful in the way we apply their products to improve the operation of our frontline services, and this helps to de-risk ongoing cost,’ he told the publication. ‘The point is that the true cost is in the total cost of ownership and exploitation, not just the license cost.’”
So while open source is enticing, it is possible that many organizations enter into open source implementations without considering the cost of customization, security, etc. and all the staffing time that goes with that. And while there may be good reasons to still go your own way with open source, it is best to do the research ahead of time and possibly consult with professionals who can look at the total cost of installation.
Emily Rae Aldridge, June 02, 2014
May 23, 2014
Short honk: Qink.net offers a useful list of freely available SharePoint libraries. You can find the listing at http://bit.ly/1lGc7tM. There is no major subcategory for “information retrieval.” There is a pointer to Apache’s Lucene.net page. After scanning the list, my thought was that search is not a mainstream focus for these freely available components.
Stephen E Arnold, May 23, 2014
May 13, 2014
It is time for people to understand that relational databases were not made to handle big data. There is just too much data jogging around in servers and mainframes and the terabytes run circles around relational database frameworks. It is sort of like a smart fox toying with a dim hunter. It is time that more robust and reliable software was used, like Hadoop. GCN says that there are “5 Ways Agencies Can Use Hadoop.”
Hadoop is an open source programming framework that spreads data across server clusters. It is faster and more inexpensive than proprietary software. The federal government is always searching for ways to slash cuts and if they turn to Hadoop they might save a bit in tech costs.
“It is estimated that half the world’s data will be processed by Hadoop within five years. Hadoop-based solutions are already successfully being used to serve citizens with critical information faster than ever before in areas such as scientific research, law enforcement, defense and intelligence, fraud detection and computer security. This is a step in the right direction, but the framework can be better leveraged.”
The five ways the government can use Hadoop is to store and analyze unstructured and semi-structured data, improve initial discovery and exploration, making all data available for analysis, a staging area for data warehouses and analytic data stores, and it lowers costs for data storage.
So can someone explain why this has not been done yet?
May 7, 2014
Writer Christopher Tozzi opens his Var Guy article, “MapR, Elasticsearch Partner on Open Source Big Data Search,” with a good question: With so many Hadoop distributions out there, what makes one stand out? MapR hopes an integration with Elasticsearch will help them with that. The move brings to MapR, as the companies put it, “a scalable, distributed architecture to quickly perform search and discovery across tremendous amounts of information.” They report that several high-profile clients are already using the integrated platform.
Tozzi concludes with an interesting observation:
“From the channel perspective, the most important part of this story is about the open source Hadoop Big Data world becoming an even more diverse ecosystem where solutions depend on collaboration between a variety of independent parties. Companies such as MapR have been repackaging the core Hadoop code and distributing it in value-added, enterprise-ready form for some time, but Elasticsearch integration into MapR is a sign that Hadoop distributions also need to incorporate other open source Big Data technologies, which they do not build themselves, to maximize usability for the enterprise.”
It will be interesting to see how that need plays out throughout the field. MapR is headquartered in San Jose, California, and was launched in 2009. Formed in 2012, Elasticsearch is based in Amsterdam. Both Hadoop-happy companies maintain offices around the world, and each proudly counts some hefty organizations among their customers.
Cynthia Murrell, May 07, 2014
April 29, 2014
Microsoft is getting its open source on. Ars Technica reports, “Microsoft Open Sources a Big Chunk of .NET.” It seems the tech giant is softening its stance on open source resources; perhaps they now see they have little choice if the company wants to remain relevant. Writer Peter Bright reports:
“At its Build developer conference today [April 3, 2014], Microsoft announced that it was open sourcing a wide array of its .NET libraries and related technologies and creating a group, the .NET Foundation, to oversee the development and stewardship of the open source components.
“Perhaps the highlight of the announcement today was that the company will be releasing its Roslyn compiler stack as open source under the Apache 2.0 license. Roslyn includes a C# and Visual Basic.NET compiler, offering what Microsoft calls a ‘compiler as a service.’”
Included in the .NET Foundation are reps from Microsoft (of course), GitHub, and Xamarin. Xamarin and Microsoft have been collaborating for some time, and the former is contributing some if its own libraries to the Foundation. If Xamarin’s experience is any example, Microsoft really is making it easier to collaborate with them. Bright writes:
“We talked to Xamarin CTO Miguel de Icaza about working with Microsoft and the decision to make these components open source. For a long time, he said that while the engineers at the two companies had a good relationship, the decisions that Microsoft made—such as not allowing certain pieces of code to be used on non-Windows platforms—made things difficult for Xamarin.
“However, that changed late last year…. Last November, the companies announced that they were partnering to in order to make it easier to use Xamarin’s tools to write code that works on both Microsoft and non-Microsoft platforms.”
Ah, cooperation! The article specifies that Microsoft has removed troublesome license restrictions, solicited design feedback from Xamarin, published docs under a Creative Commons license, and furnished Xamarin with its internal .NET test suite. Is this a sign of things to come? Stay tuned to see whether Microsoft continues to play well with others.
Cynthia Murrell, April 29, 2014
April 3, 2014
Elasticsearch is the favored open source search application and many startups have built their own products on top of the platform, increasing competition among the startups. InfoWorld lets us know that the competition is about to get stiffer in the article, “Logstash Steps Up As Splunk’s Latest Challenger.”
Splunk offers many big data solutions, including security, analytics, application management, and cloud services. The article explains that Logstash is part of a components stack also including Kibaba and Elasticsearch. It is used to log data and can be configured to a user’s needs. It is an Apache-licensed open source endeavor and has a lower cost margin (either free or a different pay for support plans). Elasticsearch has commercialized Logstash through its Marvel product.
It does not appear that Logstash is a direct competitor, but the article explains:
“So far, the biggest distinction between Splunk and its competition is how they’re productized. Splunk’s a proprietary item, but with the emphasis on it being a product and not simply a technology stack. The competition still largely consists of open source stacks rather than actual services, but it’s clear the gap between what Splunk offers at a cost and what others offer for free is closing.”
Another new service pressures Lucid Imagination and other search vendors to create a response, which also makes investors inpatient as Elasticsearch surges forward with bigger and better ideas. Search vendors are lost in the middle as they try to be competitive and earn a profit at the same time. Kudos to Elasticsearch and open source applications.