Crazy, Wild Hadoop Prioritization Advice

November 12, 2015

I read “Top 10 Priorities for a Successful Hadoop Implementation.” A listicle. I understand. Clicks. Visibility. Fame. Fortune. Well, hopefully.

I wanted to highlight two pieces of advice delivered in a somber, parental manner. Here are two highlights from the write up intended to help a Hadoop administrator get ‘er done and keep the paychecks rolling in.

Item 2 of 10: “Innovate with Big Data on enterprise Hadoop.” I find it amusing when advisors, poobahs, and former middle school teachers tell another person to innovate. Yep, that works really well. Even those who innovate are faced with failure many times. I think the well ran dry for some of the Italian Renaissance artists when the examples of frescos in Nero’s modest home were recycled. Been there. Done that. The notion of a person innovating with an enterprise deployment of Hadoop strikes me as interesting, but probably not a top 10 priority. How about getting the data into the system, formulating a meaningful query, and figuring out how to deal with the batchiness of the system?

Item 9 of 10: “Look for capabilities that make Hadoop data look relational.” There is a reason to use Codd type data management systems. Those reasons include that they work when properly set up, and they require data which can be sliced and diced. Maybe not easily, but no one fools himself or herself thinking, “Gee, why don’t I dump everything into one big data lake and pull out the big, glossy fish automagically.”

I am okay with advice. Perhaps it should reflect the reality with which open source data management tools present to an enterprise user seeking guidance. Enterprise search vendors got themselves into a world of hurt with this type of casual advice. Where are those vendors now?

Stephen E Arnold, November 12, 2015

Amazon Punches Business Intelligence

November 11, 2015

Amazon already gave technology a punch when it launched AWS, but now it is releasing a business intelligence application that will change the face of business operations or so Amazon hopes.  ZDNet describes Amazon’s newest endeavor in “AWS QuickSight Will Disrupt Business Intelligence, Analytics Markets.”  The market is already saturated with business intelligence technology vendors, but Amazon’s new AWS QuickSight will cause another market upheaval.

“This month is no exception: Amazon crashed the party by announcing QuickSight, a new BI and analytics data management platform. BI pros will need to pay close attention, because this new platform is inexpensive, highly scalable, and has the potential to disrupt the BI vendor landscape. QuickSight is based on AWS’ cloud infrastructure, so it shares AWS characteristics like elasticity, abstracted complexity, and a pay-per-use consumption model.”

Another monkey wrench for business intelligence vendors is that AWS QuickSight’s prices are not only reasonable, but are borderline scandalous: standard for $9/month per user or enterprise edition for $18/month per user.

Keep in mind, however, that AWS QuickSight is the newest shiny object on the business intelligence market, so it will have out-of-the-box problems, long-term ramifications are unknown, and reliance on database models and schemas.  Do not forget that most business intelligence solutions do not resolve all issues, including ease of use and comprehensiveness.  It might be better to wait until all the bugs are worked out of the system, unless you do not mind being a guinea pig.

Whitney Grace, November 11, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Photo Farming in the Early Days

November 9, 2015

Have you ever wondered what your town looked like while it was still urban and used as farmland?  Instead of having to visit your local historical society or library (although we do encourage you to do so), the United States Farm Security Administration and Office Of War Information (known as  FSA-OWI for short) developed Photogrammer.  Photogrammer is a Web-based image platform for organizing, viewing, and searching farm photos from 1935-1945.

Photogrammer uses an interactive map of the United States, where users can click on a state and then a city or county within it to see the photos from the timeline.  The archive contains over 170,000 photos, but only 90,000 have a geographic classification.  They have also been grouped by the photographer who took the photos, although it is limited to fifteen people.  Other than city, photographer, year, and month, the collection c,an be sorted by collection tags and lot numbers (although these are not discussed in much detail).

While farm photographs from 1935-1945 do not appear to need their own photographic database, the collection’s history is interesting:

“In order to build support for and justify government programs, the Historical Section set out to document America, often at her most vulnerable, and the successful administration of relief service. The Farm Security Administration—Office of War Information (FSA-OWI) produced some of the most iconic images of the Great Depression and World War II and included photographers such as Dorothea Lange, Walker Evans, and Arthur Rothstein who shaped the visual culture of the era both in its moment and in American memory. Unit photographers were sent across the country. The negatives were sent to Washington, DC. The growing collection came to be known as “The File.” With the United State’s entry into WWII, the unit moved into the Office of War Information and the collection became known as the FSA-OWI File.”

While the photos do have historical importance, rather than creating a separate database with its small flaws, it would be more useful if it was incorporated into a larger historical archive, like the Library of Congress, instead of making it a pet project.

Whitney Grace, November 9, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

TemaTres Open Source Vocabulary Server

November 3, 2015

The latest version of the TemaTres vocabulary server is now available, we learn from the company’s blog post, “TemaTres 2.0 Released.” Released under the GNU General Public License version 2.0, the web application helps manage taxonomies, thesauri, and multilingual vocabularies. The web application can be downloaded at SourceForge. Here’s what has changed since the last release:

*Export to Moodle your vocabulary: now you can export to Moodle Glossary XML format

*Metadata summary about each term and about your vocabulary (data about terms, relations, notes and total descendants terms, deep levels, etc)

*New report: reports about terms with mapping relations, terms by status, preferred terms, etc.

*New report: reports about terms without notes or specific type of notes

*Import the notes type defined by user (custom notes) using tagged file format

*Select massively free terms to assign to other term

*Improve utilities to take terminological recommendations from other vocabularies (more than 300: http://www.vocabularyserver.com/vocabularies/)

*Update Zthes schema to Zthes 1.0 (Thanks to Wilbert Kraan)

*Export the whole vocabulary to Metadata Authority Description Schema (MADS)

*Fixed bugs and improved several functional aspects.

*Uses Bootstrap v3.3.4

See the server’s SourceForge page, above, for the full list of features. Though as of this writing only 21 users had rated the product, all seemed very pleased with the results. The TemaTres website notes that running the server requires some other open source tools: PHP, MySql, and HTTP Web server. It also specifies that, to update from version 1.82, keep the db.tematres.php, but replace the code. To update from TemaTres 1.6 or earlier, first go in as an administrator and update to version 1.7 through Menu-> Administration -> Database Maintenance.

Cynthia Murrell, November 3, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Libraries Failure to Make Room for Developer Librarians

October 23, 2015

The article titled Libraries’ Tech Pipeline Problem on Geek Feminism explores the lack of diverse developers. The author, a librarian, is extremely frustrated with the approach many libraries have taken. Rather than refocusing their hiring and training practices to emphasize technical skills, many are simply hiring more and more vendors, hardly a solution. The article states,

“The biggest issue I see is that we offer a fair number of very basic learn-to-code workshops, but we don’t offer a realistic path from there to writing code as a job. To put a finer point on it, we do not offer “junior developer” positions in libraries; we write job ads asking for unicorns, with expert- or near-expert-level skills in at least two areas (I’ve seen ones that wanted strong skills in development, user experience, and devops, for instance).”

The options available are that librarians either learn to code in their spare time (not viable), or enter the tech workforce temporarily and bring your skills back after a few years. This option is also full of drawbacks, especially that even white women are marginalized in the tech industry. Instead, the article stipulates the libraries need to make more room for hiring and promoting people with coding skills and interests while also joining the coding communities like Code4Lib.

 

Chelsea Kerwin, October 23, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Spark Burns Down Hadoop

October 20, 2015

I read “Apache Spark vs Hadoop.” I conceptualized Ronda Rousey climbing in the octagon with Ramazan Emeev. A big gate. As a certain presidential candidate might say, “Huge.”

Alas, the dust up between Spark (MapReduce on steroids) and Hadoop (a batch operation clustering system) was not much of a contest, according to the article.

I highlighted this passage:

With Apache Spark, you can act on your data in whatever way you want. Want to look for interesting tidbits in your data? You can perform some quick queries. Want to run something you know will take a long time? You can use a batch job. Want to process your data streams in real time? You can do that too.

The key to the Spark wonderfulness is RDDs or resilient distributed datasets. I underlined with definition:

They’re fine-grained, keeping track of all changes that have been made from other transformations such as map or join. This means that it’s possible to recover from failures by rebuilding from these transformations (which is why they’re called Resilient Distributed Datasets).

My goodness with these features, poor, old Hadoop may not stand a chance. Now who would win a fight between Rousey and Emeev? One could, I assume, input data about the two fighters and perform on quick queries and get an “answer.”

Like most NoSQL confections, will the answer match what happens in the ring?

Stephen E Arnold, October 20, 2015

Quote to Note: Halevy after 10 Years Before the Ads

September 23, 2015

If you track innovations at the Alphabet Google thing, you will know that a number of wizards make the outfit hum. One of the big wizards is Dr. Alon Halevy. He is a database guru, has patents, and now an essayist.

Navigate to “A Decade at Google.” The write up does not reference the ad model which makes research possible. Legal dust ups are sidestepped. The management approach and the reorganization are not part of the write up.

I did note an interesting passage, which I flagged as a quote to note:

It is common wisdom that you should not choose a project that a product team is likely to be embarking on in the short term (e.g., up to a year). By the time you’ll get any results, they will have done it already. They might not do it as well as or as elegantly as you can, but that won’t matter at that point.

I interpreted this to underscore Alphabet Google thing’s “good enough” approach to its technology. If you have time, think about the confluence of Dr. Halevy’s research and Dr. Guha’s. The semantic search engine optimization crowd may have a field day.

Stephen E Arnold, September 23, 2015

13 Big Data Trends: Fodder for Mid Tier Consultants

September 20, 2015

Let’s assume that a colleague has lost his or her job (xe, in Tennessee, I heard). The question becomes, “What can I do with my current skills to make big money is hot new sector?”

The answer appears in “13 New Trends in Big Data and Data Science.” The write up is intended to be a round up of jazzy hot topics in a couple of even hotter quasi-new facets of the database world. Like enterprise search, databases are in need of juice. Nothing helps established technology than new spins in old orbits.

My6 suggestion is to read through the list of 13 “new trends.” Pick one, and suggest to your prospect hunting pal to get hired. Nothing to it.

Allow me to illustrate the method in action.

I have selected trend 8 “The rise of mobile data exploitation.” There are some companies active in this field; for example, S2T. The S2T name means simulation software and technology. The outfit processes a range of digital information and analyzes it with the company’s own tools. Anyone can work in this sector. The demand for talent is high. The work is not too difficult. The desire to hire “experts” various aspects of data is keen. No problem. Sure, there may be some trivial requirements like checking with a person’s mom and his or her best friends to make sure the applicant can be trusted. Hot trend. No problemo.

Let’s look at another field.

Trend 11. High performance computing (HPC). What could be faster than Apple’s new mobile chip? What could be higher performance than the Facebook or Google infrastructure. If the job seeker is familiar with these technologies, the world of Big Data excitement awaits. The experience is the important thing, not knowledge of optimized parallelization pipelines.

Easy.

Each of the 13 trends makes it clear that there are numerous opportunities. These range from digital health (IBM Watson is a PR player) to the trivial world of analytic apps and APIs.

After reading the article, I was delighted to see how many important trends are getting buzz.

Big Data is definitely the go to discipline. I anticipate that anyone interested in search and cotnent processing will be able to pursue a career in Big Data.

Now some skeptics believe that Big Data is a nebulous concept. Do not be dissuaded. The 13 trends are evidence that databases and the analysis of their contents is the future. Just as these activities have been since the days of Edgar Codd.

The mid tier consultants can ride with the hounds.

Stephen E Arnold, September 20, 2015

What Is Your Database Worth?

September 11, 2015

I don’t have a single answer to this question. There is an interesting database valuation item in “CrunchBase Is Spinning Out, Backed by Emergence Capital.”

CrunchBase is an aggregator of technology company information. I think the service does a good job with companies in the Sillycon Valley area. The coverage tails a bit for Rust Belt start ups, but that’s no surprise.

The database attracts two million “visitors” each month. I remain uncertain about the meaning of a “visitor,” but when most Web sites get a few hundred or fewer hits, two million seems like a lot. It is almost identical in hype thought to Facebook’s one billion users in a 24 hour period. I was pretty good at math in grade school too.

The write up’s gem was this statement:

Eight-year-old, San Francisco-based CrunchBase looks to become a standalone company in the very near future. According to several sources, the unit, which calls itself the “definitive database of the startup ecosystem,” is finalizing a term sheet with the venture firm Emergence Capital Partners for an investment of between $5 million and $7 million.

Assume that the $7 million number is on the money. That works out to $0.30 per visitor. That is almost a million in my book.

Stephen E Arnold, September 11, 2015

 Datameer Declares a Celebration

September 8, 2015

The big data analytics and visualization company Datameer, Inc. has cause to celebrate, because they have received a huge investment.  How happy is Datameer?  Datameer’s CEO Stefan Groschupf explains on the company blog in the post, “Time To Celebrate The Next Stage Of Our Journey.”

Datameer received $40 million in a round of financing from ST Telemedia, Top Tier Capital Partners, Next World Capital, Redpoint, Kleiner Perkins Caufield & Byers, Software AG and Citi Ventures.  Groschupf details how Datameer was added to the market in 2009 with the vision to democratize analytics.  Since 2009, Datameer has helped solve problems across the globe and is even helping make it a better place.  He continues he is humbled by the trust the investors and clients place in Datameer, which feeds into the importance of analytics for not only companies, but also anyone who wants supportable truth.

Datameer has big plans for the funding:

“We’ll be focusing on expanding globally, with an eye toward APAC and Latin America as well as additional investment in our existing teams. I’m looking forward to continuing our growth and building a long-term, sustainable company that consistently provides value to our customers. Our vision has been the same since day one – to make big data analytics easy for everyone. Today, I’m happy to say we’re still where we want to be.”

Datameer was one of the early contenders in big data that always managed to outshine and outperform its bigger name competitors.  Despite its record growth, Datameer continues to remain true to its open source roots.  The company wants to make analytics available to every industry and everyone.  What is incredibly impressive is that Datameer has numerous applications for its products from gaming to healthcare, which is usually unheard of.  Congratulations to Datameer!

Whitney Grace, September 8, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta