Short Honk: Hadoop Ecosystem Made Clear
January 3, 2016
Love Hadoop. Love all things Hadoopy? You will want to navigate to “The Hadoop Ecosystem Table.” You have categories of Hadoopiness with examples of the Hadoop amoebae. You are able to see where Spark “fits” or Kudu. Need some document data model options? The table will deliver: ArangoDB and more. Useful stuff.
Stephen E Arnold, December 30, 2015
The Importance of Google AI
December 23, 2015
According to Business Insider, we’ve all been overlooking something crucial about Google. Writer Lucinda Shen reports, “Top Internet Analyst: There Is One Thing About Google that Everyone Is Missing.” Shen cites an observation by prominent equity analyst Carlos Kirjner. She writes:
“Kirjner, that thing [that everyone else is missing] is AI at Google. ’Nobody is paying attention to that because it is not an issue that will play out in the next few quarters, but longer term it is a big, big opportunity for them,’ he said. ‘Google’s investments in artificial intelligence, above and beyond the use of machine learning to improve character, photo, video and sound classification, could be so revolutionary and transformational to the point of raising ethical questions.’
“Even if investors and analysts haven’t been closely monitoring Google’s developments in AI, the internet giant is devoted to the project. During the company’s third-quarter earnings call, CEO Sundar Pichai told investors the company planned to integrate AI more deeply within its core business.”
Google must be confident in its AI if it is deploying it across all its products, as reported. Shen recalls that the company made waves back in November, when it released the open-source AI platform TensorFlow. Is Google’s AI research about to take the world by storm?
Cynthia Murrell, December 23, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Open Source Survey: One Big Surprise about Code Management
November 23, 2015
I read “Awfully Pleased to Meet You: Survey Finds Open Source Needs More Formal Policies.”
The fact that eight out of 10 outfits in the sample were using open source software was no surprise. The sponsor of the survey is open source centric.
The point I highlighted was:
According to the study, less than 42% of organizations maintain a IT Asset Management (ITAM) style inventory of open source components.
When I read this, I thought, “Who keeps track of the open source components?”
The answer in more than half the companies in the sample was, “Huh? What?”
I circled this point:
Shipley [Black Duck top dog] has also added the following comment, “In the results this year, it has become more evident that companies need their management and governance of open source to catch up to their usage. This is critical to reducing potential security, legal, and operational risks while allowing companies to reap the full benefits OSS provides.”
Is the reason companies spend money with open source commercial plays buying management? If that is the case, the successful commercial open source outfit is the one that has the ability to manage, not the technology and trends the marketers at certain commercial open source companies hype.
Stephen E Arnold, November 23, 2015
Lucidworks: Another $21 Million in Funding
November 19, 2015
Lucidworks (a eight year old “start up” founded in 2007) has raised an additional $21 million in funding. According to Crunchbase, the total funds injected into the open source centric company is now $53 million.
The news release “Lucidworks Announces $21 Million in Series D Funding” states:
Lucidworks, the chosen search solution for leading brands and organizations around the world, today announced $21 million in new financing. Allegis Capital led the round with participation from existing investors Shasta Ventures and Granite Ventures. Lucidworks will use the funds to accelerate its product-focused mission enabling companies to translate massive amounts of data into actionable business intelligence.
The statement included this observation attributed to Spencer Tail, Allegis Capital:
Lucidworks has proven itself, not only by providing the software and solutions that businesses need to benefit from Lucene/Solr search, but also by expanding its vision with new products like Fusion that give companies the ability to fully harness search technology suiting their particular customers. We fully support Lucidworks, not only for what it has achieved to date — disruptive search solutions that offer real, immediate benefits to businesses — but for the promising future of its product technology.
Lucidworks, formerly Lucid Imagination, competes with Elastic. Companies from IBM to OpenSearchServer offer solutions which compete in the same market sector. Elastic’s funding is in the $104 million range.
The horses are away from the starting gate. And the winner will be a steed with the best jockey? Stay tuned because the track is muddy.
Stephen E Arnold, November 19, 2015
On the Prevalence of Open Source
November 11, 2015
Who would have thought, two decades ago, that open source code was going to dominate the software field? Vallified’s Philip O’Toole meditates on “The Strange Economics of Open-Source Software.” Though the industry gives so much away for free, it’s doing quite well for itself.
O’Toole notes that closed-source software is still in wide use, largely in banks’ embedded devices and underpinning services. Also, many organizations are still attached to their Microsoft and Oracle products. But the tide has been turning; he writes:
“The increasing dominance of open-source software seems particularly true with respect to infrastructure software. While security software has often been open-source through necessity — no-one would trust it otherwise — infrastructure is becoming the dominant category of open-source. Look at databases — MySQL, MongoDB, RethinkDB, CouchDB, InfluxDB (of which I am part of the development team), or cockroachdb. Is there anyone today that would even consider developing a new closed-source database? Or take search technology — elasticsearch, Solr, and bleve — all open-source. And Linux is so obvious, it is almost pointless to mention it. If you want to create a closed-source infrastructure solution, you better have an enormously compelling story, or be delivering it as part of a bigger package such as a software appliance.”
It has gotten to the point where developers may hesitate to work on a closed-source project because it will do nothing for their reputation. Where do the profits come from, you may ask? Why in the sale of services, of course. It’s all part of today’s cloud-based reality.
Cynthia Murrell, November 11, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Google Uses Ninja Death Strike for Smart Software
November 10, 2015
I read “Google Tries an Android for Machine Learning, Releasing Open Source AI System.” The write up draws a parallel with Google’s Android strategy. The idea is to make something available in order to get developers and then eye balls.
I noted this paragraph:
The best explanatory quote comes from Greg Corrado, a senior researcher, in Google’s video on the system, embedded below: “There should really be one set of tools that researchers can use to try out their crazy ideas. And if those ideas work, they can move them directly into products without having to rewrite the code.”
The article mentions that the monopolists in hope and practice are into smart software. Smart software means 24×7 analytic type activity without humans. Better. Faster. Cheaper. More lucrative if one outfit sweeps up most of the activity. The goal is advertising and a reasonable chance at the type of market dominance that warmed the cockles of Andrew Carnegie’s heart.
There is one idea which caught my attention. The article and most of the others about this announcement did not mention the erstwhile leader of cognitive computing. IBM Watson is smart software, and it has a DNA anchored in open source, acquired technology, and the scripts of IBM researchers.
IBM Watson wants and needs its smart software to become a $1 billion business and pronto. Then IBM needs Watson to generate tens or hundreds of billions for the Big Blue stakeholders.
IBM is not an outfit with giving software away. I think that IBM will have to do a rethink and tap into Watson’s capabilities to find a tactic to get its smart software mojo back.
Did Google craft its open source play to blunt IBM? Nah. Google just wants to be Googley because being Alphabetty does not have the same cachet.
Does the Alphabet Google thing have a heart of gold and a weaponized open source strategy? Interesting question.
Stephen E Arnold, November 10, 2015
Open Source: A Bad Fit for Corporations?
November 9, 2015
I read “Corporations and OSS Do Not Mix.” The write up fooled me. I thought the approach was going to be that proprietary software vendors and open source code may find themselves at odds.
I was wrong.
The article explains that open source software and commercial organizations bump into licensing issues and some real world hurdles. The article states:
the joy and enthusiasm that I had when I started working on open source has been flattened. My attitude was naïve at best – this is fun and maybe I’m helping some other people do good and have fun too. This is also how a lot of my friends presently view their projects.
The list of challenges ranges from the selfishness of the commercial enterprise to dumb requests.
I also noted this passage:
Open source software is full of toxic people. This certainly shouldn’t be a surprise at this point. I would guess that it is safe to say that pretty much every person (including myself, I’m certainly not exempt from this) has had bad days and reacted poorly when dealing with the community, contributors, colleagues, etc. These are not excuses and these events can (and often do) shape the behaviors of the community and those observing it.
The article includes a list of positive ideas.
My hunch is that search vendors with proprietary software will become aggressive disseminators of the anti-open source possibilities of this write up.
That’s what makes search and content processing such credible business sectors.
Stephen E Arnold, November 9, 2015
Datafari Ventures into the Enterprise Search Jungle
October 29, 2015
A less-than-enthusiastic reader called out attention to Datafari, a new explorer of the enterprise search jungle. The software uses Solr and contains “the heart of a CMS.” The Datafari Web site explains:
A CMS allows for organizing collaboration within a company. But it is never monolithic, and only a federated search engine can fin the data wherever they are.
Datafari, Version 2.0 is explained in a video at this link. The system permits key word search and offers a point-and-click sidebar to facilitate exploration of the content.
A user can save a particular document to a Favorites folder. The system administrator can view log file data in a graphical format. Hit boosting is available as well.
A live demonstration is available at this link. When I visited the site, it appeared that I needed to load my own content into the system. I decided against taking this step.
If you are looking for an enterprise search system that can double as a content management system, Datafari may be for you. The company is located in France, so a trip for training could be an added bonus.
Stephen E Arnold, October 29, 2015
Short Honk: Crawl the Web at Scale
September 30, 2015
Short honk: I read “Aduana: Link Analysis to Crawl the Web at Scale.” The write up explains an open source project which can copy content “dispersed all over the Web.” Keep in mind that the approach focuses primarily on text. Aduana is a special back end for the developer’s tool for speeding up crawls which is built on top of a data management system.
According to the write up:
we wanted to locate relevant pages first rather than on an ad hoc basis. We also wanted to revisit the more interesting ones more often than the others. We ultimately ran a pilot to see what happens. We figured our sheer capacity might be enough. After all, our cloud-based platform’s users scrape over two billion web pages per month….We think Aduana is a very promising tool to expedite broad crawls at scale. Using it, you can prioritize crawling pages with the specific type of information you’re after. It’s still experimental. And not production-ready yet.
In its present form, Aduana is able to:
- Analyze news.
- Search locations and people.
- Perform sentiment analysis.
- Find companies to classify them.
- Extract job listings.
- Find all sellers of certain products.
The write up contains links to the relevant github information, some code snippets, and descriptive information.
Stephen E Arnold, September 30, 2015
Spark: Another Open Source Game Changer
September 24, 2015
Gentle reader, I know that knowledge about Spark is as widespread as information about the woes of the Philadelphia Eagles. My understanding of Spark is that is is an open source engine for large scale data processing. It is faster than Hadoop. It is easy to use. It is flexible enough to allow the intrepid Spark aficionado the combine structured query language, streaming, and analytics in one software system. Spark runs “everywhere.” For more about Spark, see this Apache project page.
Spark is one of the next big things, poised to ignite innovation, consulting revenues, innovations, and vendor repositionings.
I approached “Game-Changing Real-time Uses for Apache Spark” in order to learn how Spark can change the game for real time data and information work. Game changing means that old school outfits are going to lose because the new game has new rules, new players, and new everything.
The write up identified these ways Spark will change some quite significant markets:
- Credit card fraud detection
- Network security
- Genomic sequencing
- Real time ad processing
- Medical
My goodness, Spark will become the number one enabling technology for some very problematic market spaces.
Let’s look at what Spark will do to real time ad processing. The write up reports:
One advertising firm uses Spark, on MapR-DB, to build a real-time ad targeting platform. The system looks at user data and decides which ads to show users on the Internet based on demographic data. Since advertising is so time-sensitive, advertisers have to move fast if they want to capture mindshare. Spark Streaming is one way to help them do that.
What strikes me is that Spark requires programmers, software engineering, and then integration of different components. If an error manifests itself, the Spark solution may require those who embrace it to perform some old fashioned work.
In a sense, the game hasn’t changed at all. Open source software reduces license fees and provides a developer with some freedom from license restrictions. On the other hand, the difficult task of getting a complex system to work as intended remains.
My hunch is that Spark is an interesting open source project. The consultants and start ups see Spark as an opportunity. The game changing nature of Spark is potential energy, not a sure thing.
Stephen E Arnold, September 23, 2015