August 27, 2014
In Homer’s Odyssey, the character Cassandra had the gift of prophesy, but she was also cursed to where no one believed her. The NoSQL database of the same name shared a similar problem when it first started, but unlike the tragic heroine it has since grown to be a popular and profitable bit of code. Wired discusses Cassandra’s history and current endeavors in “Out In the Open: The Abandoned Facebook Tech That Now Helps Power Apple.”
Cassandra is the brainchild of Jonathan Ellis and he used it to found DataStax. Facebook used Cassandra to better scale information across machines and open sourced it in 2008. It faded into the background for a while, but DataStax continued to gain traction with its proprietary software. Apple has since joined the Cassandra community and is its second largest contributor. DataStax, however, will not acknowledge that Apple is one of its clients.
The article points out that a single database product cannot reign supreme in 2014’s market. New ways to house and utilize data will continue to grow, much of it driven by open source. What does that mean for DataStax and Cassandra?
“Ellis says the strategy for Cassandra and DataStax will be ensuring that its technology can work with any new technology that can come along. For example, DataStax recently released a connector for Spark that will enable developers to easily use Spark to analyze data stored in Cassandra. ‘We’re trying to be the database that drives our application, not necessarily the analytics,’ he says. ‘There’s nothing that marries us to one of those platforms.’”
From reading this, it seems the big data push has quieted down somewhat, but companies based on open source software are trying to create products that allow people to use their data smarter and without the holdups of earlier big data pushes. One thing for sure is if DataStax truly does have Apple as a client, they can kiss success on the mouth.
August 25, 2014
Just as Elasticsearch is reveling in its recent successes, CloudPro informs us that “Hackers Target Elasticsearch to Set Up DDoS Botnet on AWS.” Writer Rene Millman reports that cloud providers besides Amazon Web Services could be affected by the attacks, which leverage a vulnerability in the older Elasticsearch 1.1 versions. Because of its ability to run on multiple nodes, Elasticsearch’s open source, Java-based full-text-search application is a popular choice for use with cloud environments. The article describes the vulnerability hackers are now exploiting:
“Researchers at Kaspersky Labs have found that cybercriminals have exploited a flaw in the software to install DDoS malware on various clouds. The flaw was found in Elasticsearch v. 1.1x and a scripting exploit. The software has default support for active scripting, but does not use authentication and also does not sandbox the script code. Criminals can use the flaw to hack into EC2 VMs and then use a use a new variant of Linux DDoS Trojan Mayday – Backdoor.Linux.Mayday.g – to launch their attack, according to Kaspersky Lab principal security researcher Kurt Baumgartner.”
“The [Mayday variants] in use on compromised EC2 instances oddly enough were flooding sites with UDP traffic only. The flow is strong enough that the DDoS’d victims were forced to move from their normal hosting operations IP addresses to those of an anti-DDoS solution.
“The flow is also strong enough that Amazon is now notifying their customers, probably because of potential for unexpected accumulation of excessive resource charges for their customers. The situation is probably similar at other cloud providers.”
Unsurprisingly, the goal of these attacks seems to be financial. Baumgertner notes that among those affected by this attacks are a large regional U.S. bank, a large electronics maker, and a Japanese service provider. For its part, Amazon is urging users to upgrade asap to the latest version of Elasticsearch, which is free from this vulnerability.
Cynthia Murrell, August 25, 2014
August 24, 2014
Which is more economical? Proprietary software or open source software? Which approach delivers greater “value”? In Wal-Mart’s tussle with Amazon, will it deliver a better online experience for shopping, search, and logistics? I ask because the Wal-Mart closest to Harrod’s Creek has fewer products, dimmer lighting, and restocking challenges in my experience.
Some information that may help answer these questions appeared in “Wal-Mart’s Investment in Open Source Isn’t Cheap.” Note that this publication is owned by IDG / IDC the mid tier consulting firm that sold my content on Amazon without my permission. Some details are at this link.
This write up explains that open source software is more than a price:
Wal-Mart has put in place a set of metrics to estimate the return on investment. Hammer explains “every five startups using Hapi translated to the value of one full-time developer, while every 10 large companies translated to one full-time senior developer.” In return for its extra work on open development, Wal-Mart gets high-quality programming at a cost far below that of recruiting and retaining extra staff. In turn, this demonstrable return allows the company to justify further development investment because “by paying developers to work on Hapi full time, we get back twice (or more) that much in engineering value.”
Wal-Mart, however, is a place that sells stuff at what looks like low prices. There are some legal arabesques related to Wal-Mart’s parsimonious streak.
- Is Wal-Mart looking for ways to obtain maximum freedom from traditional vendors, not just value or cost savings. Freedom can translate to handling software the Wal-Mart way?
- Will developers find themselves subject to the same cost parameters that Wal-Mart has honed to deliver its competitive prices?
- How will Wal-Mart adapt when an open source project loses its community?
With Amazon looking more and more proprietary, Wal-Mart seems to be heading in the opposite direction. Will Wal-Mart out Amazon Amazon or will Wal-Mart become more like Amazon?
The search experience for both Amazon and Wal-Mart online is often frustrating. Perhaps in a few months one of these discounters will crack their information retrieval nuts.
For those looking for information about the cost of open source, the Wal-Mart approach is worth tucking into one’s card file.
Stephen E Arnold, August 24, 2014
August 7, 2014
Anyone on the lookout for a free intranet search system? FreewareFiles offers Arch Search Engine 1.7, also known as CSIRO Arch. The software will eat up 22.28MB, and works on both 32-bit and 64-bit systems running Windows 2000 through Windows 7 or MacOS or MacOS X. Here’s part of the product description:
Arch is an open source extension of Apache Nutch (a popular, highly scalable general purpose search engine) for intranet search. Not happy with your corporate search engine? No surprise, very few people are. Arch (finally!) solves this problem. Don’t believe it? Try Arch, blind test evaluation tools are included.
In addition to excellent search quality, Arch has many features critical for corporate environments, such as document level security.
*Excellent search quality: Arch has solved the problem of providing good search results for corporate web sites and intranets!
*Up to date information: Arch is very efficient at updating indexes and this ensures that the search results are up to date and relevant. Unlike most search engines, no complete ‘recrawls’ are done. The indexes can be updated daily, with new pages discovered automatically.
*Multiple web sites: Arch supports easy dynamic inclusion or removal of websites.
They also say the system is easy to install and maintain; uses two indexes so there’s always a working one; and is customizable with either Java or PHP.
Cynthia Murrell, August 07, 2014
July 16, 2014
If you are looking for an auto-summarization tool, TechCrunch says “Auto-Summarization Tool TextTeaser Relaunches As Open Source Code.” Joe Balbin is the creator of TextTeaser and he added it to GitHub after experiencing scalability issues in the API. Balbin recoded the program and the process is now faster. Developers have two plan options: one is $12 for ever 1000 articles summarized, while the enterprise plan is $250/month and comes with a dedicated server to store the article source.
“ ‘In this TextTeaser, you can train your own summarizer,’ Balbin explains. ‘You can provide the category and source of the article that will be used to improve the quality of the summaries. In the future, users might also have the ability to provide what keyword is important and what is not.’ ”
TextTeaser is used in reader apps, such as Gist. Balbin hopes to optimize the program for medical, financial, and legal documents.
TextTeaser sounds like it makes reading faster. The code is a valuable tool. We will stay tuned to see how else it is used.
Whitney Grace, July 16, 2014
July 8, 2014
The article titled Hadoop Sector will Have Annual Growth of 58% for 2013-2020 in CloudTimes offers a wild and crazy market size estimate for the company. Hadoop is open source so this is a lot of services revenue. Hadoop’s achievement is based on work in big data analysis, access to big data at high speeds, and the management of unstructured data. Keeping costs low while maintain effectiveness spelled success for Hadoop. The article states,
“The report categorized the Hadoop software market into application software, management software, packaged software and performance monitoring software and found that application software category is leading the global Hadoop software market due to high return in its increasing implementation by developers to build real time applications. Also, Hadoop packaged software provides easier deployment of Hadoop clusters. Thus, Hadoop projects such as MapReduce, Sqoop, Hive and others can be smoothly integrated.”
The article does offer some caution to balance the wildly positive report for Hadoop. Due to holes in qualified staff to fill the company, there has been some slowing of growth especially in small and medium enterprises, who might hesitate to adopt the software. Hadoop is booming with government sectors, manufacturing, BFSI, retail and healthcare, among other areas.
Chelsea Kerwin, July 08, 2014
July 3, 2014
If you are interested in the utility of open source information, you will want to pay particular attention to the disappearing content triggered by the EU’s right to be forgotten. Information is hard to find if the index has been scrubbed. I thought about the “disappearing” of information when I read “Out of Band.” The write up states:
Crowdsourcing and the wealth of networks are terms that are in vogue. What the government generally, and the secret world particularly, refuse to knowledge is that information is a team sport and nature bats last. The government is only as good as its ability to do outreach, and if it relies on lies, nature—reality—will always reveal the truth at some future date.
Interesting point. However, when the most used source of information is filtering information, open source access becomes more important. With a single point of access, the reality becomes what’s findable. Will information access expand. Mr. Steele points out:
For the secret world, only a million-dollar custom-made shim will do, and they won’t notice if the beltway bandit sells them a piece of a beer can claiming it is the custom shim. I cannot overstate the ignorance and inattentiveness of today’s contracting officers and contracting officer technical representatives in the secret world.
In my view, his perspective applies to both commercial indexes and to government information methods. Fascinating. I keep wondering if Google is now the information government.
Stephen E Arnold, July 3, 2014
June 2, 2014
Maybe to the dismay of users, Microsoft winds up being cheaper long term than open source software. When it comes to total cost, Microsoft actually overcomes seemingly cheaper options once all investments in the system are considered. The topic is covered in a popular forum, SlashDot. Visit this thread to read more, “Microsoft Cheaper To Use Than Open Source Software, UK CIO Says.”
The discussion begins:
“Jos Creese, CIO of the Hampshire County Council, told Britain’s ‘Computing’ publication that part of the reason is that most staff are already familiar with Microsoft products and that Microsoft has been flexible and more helpful. ‘Microsoft has been flexible and helpful in the way we apply their products to improve the operation of our frontline services, and this helps to de-risk ongoing cost,’ he told the publication. ‘The point is that the true cost is in the total cost of ownership and exploitation, not just the license cost.’”
So while open source is enticing, it is possible that many organizations enter into open source implementations without considering the cost of customization, security, etc. and all the staffing time that goes with that. And while there may be good reasons to still go your own way with open source, it is best to do the research ahead of time and possibly consult with professionals who can look at the total cost of installation.
Emily Rae Aldridge, June 02, 2014
May 23, 2014
Short honk: Qink.net offers a useful list of freely available SharePoint libraries. You can find the listing at http://bit.ly/1lGc7tM. There is no major subcategory for “information retrieval.” There is a pointer to Apache’s Lucene.net page. After scanning the list, my thought was that search is not a mainstream focus for these freely available components.
Stephen E Arnold, May 23, 2014
May 13, 2014
It is time for people to understand that relational databases were not made to handle big data. There is just too much data jogging around in servers and mainframes and the terabytes run circles around relational database frameworks. It is sort of like a smart fox toying with a dim hunter. It is time that more robust and reliable software was used, like Hadoop. GCN says that there are “5 Ways Agencies Can Use Hadoop.”
Hadoop is an open source programming framework that spreads data across server clusters. It is faster and more inexpensive than proprietary software. The federal government is always searching for ways to slash cuts and if they turn to Hadoop they might save a bit in tech costs.
“It is estimated that half the world’s data will be processed by Hadoop within five years. Hadoop-based solutions are already successfully being used to serve citizens with critical information faster than ever before in areas such as scientific research, law enforcement, defense and intelligence, fraud detection and computer security. This is a step in the right direction, but the framework can be better leveraged.”
The five ways the government can use Hadoop is to store and analyze unstructured and semi-structured data, improve initial discovery and exploration, making all data available for analysis, a staging area for data warehouses and analytic data stores, and it lowers costs for data storage.
So can someone explain why this has not been done yet?