Harnessing The Power Of Raw Public Data
May 28, 2013
The Internet allows multiple data streams to converge and release their data to end users, but very few people know how to explicitly use the public data much less on how to find it. There is a solution reports TechCrunch in the article, “Enigma Makes Unearthing And Sifting Through Public Data A Breeze.” Enigma is a New York startup with Hicham Oudghiri, Marc Dacosta, and CEO Jeremy Bronfmann on the team. The company’s software pulls data from over 100,000 public data sources and it pools the data in easy-to-read tables.
“That’s all very neat, but how does Enigma do it? The data itself comes from a host of places, but most of Enigma’s government data was obtained by issuing a Freedom of Information Act request to the U.S. General Services Administration for all the top level .gov domains. From there the team uses crawlers to download all the databases it can find, and algorithmically finds connections between all those data points to create a sort of public knowledge graph. Whenever you search for a term on Enigma, Enigma actually searches around that term to figure out and display whatever applicable data sets it can find.”
Enigma should be seen more as an infrastructure search solution and the company heads believe it could become an integral part of the Internet in five years. As a tool, it has many benefits for researchers and already it has made partnerships with the New York Times, Capital IQ, S&P Capital, Gerson Lehrman Group, and the Harvard Business School. The startup company is an enterprise at the moment, but there are possible plans for a free version in the future. Enigma pulls all its data from public resources, but it must comply with laws and regulations that come with the information. Enigma wants to play by the rules, but by playing within the bounds it hopes to become a dispenseless tool.
Whitney Grace, May 28, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Basho Cofounder Takes New Direction
May 28, 2013
Antony Falco was a co-founder of the Basho Riak distributed open source database. However, he has changed directions and is working on a new project. Read about Falco’s latest project in the TechCrunch article, “Basho Co-Founder Raises $3M To Launch Orchestrate.io, A Twilio For Databases.”
The article begins:
“Basho Co-Founder Antony Falco has raised $3 million for Orchestrate.io, a database API similar to Twilio in its capability to ease the complexity of adding features to mobile and web applications. True Ventures led this initial round joined by Frontline Ventures and Resonant Venture Partners. Falco, who left Basho a few months ago, said Orchestrate.io solves the problems that developers face when building feature-rich applications. Often it means adding multiple databases for geo-spatial, time series or any number of other features.”
The article goes on to explain that the limits of scale of relational databases has led large tech companies like Google and Amazon to develop new types of databases for high-volume queries. Falco hopes his new service will add functionality by pulling the data through an API. He is using existing open source databases to build the project, including Riak.
This is an example of the type of creativity and innovation that flourishes in the open source community. Also in the open source field serving enterprises, LucidWorks focuses on another angle, enterprise search and Big Data. These elements are complimentary, with organizations finding that open source solutions often pair well together, adapting and scaling efficiently.
Emily Rae Aldridge, May 28, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Quote to Note from News Corp. Executive
May 27, 2013
My New York Times arrived late in Harrod’s Creek, Kentucky today. I did read “News Corp. Says It Was Not Told of Subpoena for Reporter’s Phone Records.” You may be able to find the story online. The hard copy in my had ran the item on Page B3.
I don’t know too much about News Corp. I recall that the firm had some push back with regard to allegedly improper access to some folks’ telephone messages. I also know that the company is planning on splitting into different parts. I am not sure if Fox News is journalism like the Wall Street Journal or TV like the “great wasteland.”
Here’s the quote I circled:
In e-mail to employees on Thursday, Roger Ailes, chairman and chief executive of Fox News, rejected the validity of the investigation, “We will not allow a climate of press intimidation, unseen since the McCarthy era, to frighten any of us away from the truth,”Mr. Ailes said.
Fascinating stuff. Invocation of McCarthy, a reference to intimidation by the US government, and the truth. I wonder how “truth” factors into the information in the NPR story at http://goo.gl/RzJwS.
Stephen E Arnold, May 27, 2013
Enterprise Search Rumor Round Up
May 27, 2013
I attended two separate search events in the last couple of weeks. In the course of listening to presentations of widely varying quality and talking with people who sure seemed to know everything there was to know about search, I picked up some rumors.
I don’t know if these items are accurate. I want to summarize these as “of possible interest.” If you, gentle reader, can verify or debunk any of these rumors, please, use the comments section of this blog.
ITEM: A German search engine vendor does not market in the US. The rumor was that the company president brings “selected” staff to vacation at the conference venue. Good use of investor funds? If true, probably not. If not true, we will have another non US vendor chasing government contracts, OEM deals, and Fortune 1000 firms.
ITEM: Amazon thinks it labor hassles in Germany is “old news.” Amazon, like Google, seems indifferent to some quite real concerns.
ITEM: The outfit doing business as BA Insight has experienced a change at the top. I found this item interesting because the company has ingested millions of dollars and does not light up my radar as a challenger to the crowns once worn by Autonomy and Endeca.
ITEM: European government money is getting tight. One search vendor estimates that unless sales are made in the US market, the government funded company will be history in 2014. The big point? The US market is seen as the last great hope for survival.
ITEM: Exhibitors pay money to get their names on the programs. A couple of the event sponsors did not show up? Better things to do or a cash crunch?
If useful information turns up in the comments, we will pass it along.
Stephen E Arnold, May 27, 2013
Big Brother Is Hanging Around
May 27, 2013
It is easier now for governments to download pieces of spyware and monitor unsuspecting citizens. Take note of Quartz’s article, “36 Governments (Including Canada) Are Now Using Sophisticated Software To Spy On Their Citizens” about how the Canadian research center Citizen Lab discovered that thirty six countries have bought intrusive IT technology from Gamma International.
Gamma International sells its products exclusively to governments. Most governments say they use the software to track down dissidents, suspicious activity, and to monitor organized crime groups. Gamma probably states that it does not want its products to be used for negative purposes, but:
“The product may also have been used in the past by repressive nations hoping to monitor dissidents. In his new book, Eric Schmidt mentions ‘a raid on the Egyptian state security building after the country’s 2011 revolution [which] produced explosive copies of contracts with private outlets, including an obscure British firm that sold online spyware to the Mubarak regime.’ Gamma denied that it had supplied the regime with its program, which its agents were hawking for a piddling $560,000.”
Gamma is not the company to cash in on this market and it will not be the last. In relation to search, it makes you wonder if governments consider this a viable search system to find information? We wonder what search engine they use.
Whitney Grace, May 27, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
The New Hacker Class
May 27, 2013
It might sound like enrolling in Hacker School teaches you how to be a criminal coder, but rather it is an intensive program that lasts three months, four days a week for eight hours a day. The goal is for its students to learn how to be a better programmer akin to an old-fashioned training trips. According to the school’s blog, “Peter Norvig And Eight Others Are Hacker School Residents.” For those who unfamiliar with Google, Peter Norvig is the Director of Research at Google and his residency bespeaks about his dedication to helping students learn new tricks of the trade. Residents spend one or two weeks with the Hacker School and share their experience/knowledge with the students.
“We want to make Hacker School the best place on earth to become a great programmer, and we want Hacker School to be the most productive three months of our students’ lives. That’s why we started our residents program last year to bring the best programmers we can find to Hacker School. Residents come for one or two weeks and work directly with students: They pair program, do code reviews, give short talks, run seminars, and bond with the batch.”
Programs like this are a great addition to a resume, not to mention an amazing networking tool. It also proves that Google is dedicated to teaching the next generation.
Whitney Grace, May 27, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Analytics Company to Disrupt Digital and Mobile Metrics Emphasis
May 27, 2013
From Business Insider comes news of a potentially disruptive startup: “Mixpanel, A Startup That Wants To Kill Pageviews And Other ‘BS Metrics’ Now Measures 12 Billion Actions Per Month.” Mixpanel Co-founder Suhail Doshi pushes for digital and mobile companies to highlight monthly user engagement numbers instead of page views.
Mixpanel is an analytics company founded in 2009. It helps both paying and non-paying customers track engagement through actions on their sites. For example, “liking” content on Facebook is an action.
According to the article:
“Doshi admits it’s harder for content-producers to shift to his way of thinking. But changing an industry standard like pageview reporting is a slow process, and Doshi thinks his company is making good headway. ’We’re this living, breathing case that we do see pageviews are dying,’ says Doshi, who was inspired to track meaningful analytics by mentor and former colleague, Max Levchin. Pageviews are already dying on mobile devices, says Doshi, because users rarely click through to see more pages on tiny screens.”
Mixapanel’s growth implies they are doing something right. However, regarding Google Analytics, Mixpanel is making some bold assertions.
Megan Feil, May 27, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
FOCUS Issue 29 on Big Data Available Now
May 27, 2013
Datacenter Dynamics’ publication, FOCUS, just released issue 29 with an emphasis on Big Data. Penny Jones provides a look into the latest edition in her story, “FOCUS 29 – the Big Data Issue. Read it Online Now!” which appears on the Datacenter Dynamics blog.
To introduce the topic of Big Data, the author discusses the concept with the head of CERN’s open lab project, Bob Jones. She quotes him here:
“Jones likens CERN to a big industrial plant, which is quite fitting for GE’s purpose. GE pushes big data solutions focused on the Industrial Internet (see Pages 30-33).”
She then continues by introducing another leading mind in the world of Big Data:
“Another vendor representative I spoke with, LucidWork’s CTO Grant Ingersoll, told me machines and the Internet – funnily enough – are the reason we have big data in the first place. ‘[Big data] is this unique combination of man and machine that work together on data that brings new business insights and solves complex problems that, until now, have not been tackled. In many ways, business intelligence (BI), big data and analysis tools are closely related.’”
LucidWorks definitely has a voice in the Big Data market, as they seek to aid businesses with efficiency and accuracy in their enterprise search and Big Data needs. Built on Apache Lucene and Solr, the LucidWorks offerings do deliver on the promise that Ingersoll makes above, ensuring that business intelligence is coupled with analysis and Big Data in a way that ensures a competitive advantage.
Emily Rae Aldridge, May 27, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Big Data: O(log n) Again to Calculating Bad Presentations and Lousy Management
May 26, 2013
I participated in a quite interesting Big Data “event” recently. (Sorry, no link. I want to leave my post adrift in the sea of saucisson which makes up the Internet today.)
Big Data as a concept has been with us as long as there are people and storage. If a stack of clay tablets would not fit in a cabinet in ancient Babylon, the hapless analyst had a Big Data problem. When I had a Wang mini in my closet, Big Data was anything larger than 80 megabytes.
In my view, Big Data is a bit of a marketing confection. When companies cannot sell their core product like enterprise search, these outfits just tell the sales and marketing consultants to whip something up. Big Data is in my view one popular junk food when processed by some folks.
Let me set the stage.
A self appointed “expert” tried to organize a two-day lecture series to explain the basics of Big Data to those seeking information about this hot concept. The line up of teachers included marketers who took an interesting intellectual approach and me, the addled goose.
Now shift gears and think in terms of airline food. You are hungry on a flight from New York to Paris and the airline serves up a stale cracker and a substance which, from a distance, looks like cheese. Once up close, the combination did not deliver haute cuisine. Heck, the Big Data event was not the equivalent of meals ready to eat or MREs left in the equatorial sun for a couple of weeks in an undisclosed location.
The aftermath of Chernobyl event captured my impression of how misinformation about Big Data can set the stage for flawed decisions and catastrophic financial issues. Those emergency systems sure worked well in the engineering models, didn’t they? A happy quack to http://goo.gl/5u2E3.
Three impressions struck me as I reflected on the two-day event I attended with two of my colleagues. (More about my two colleagues in a good news moment.)
First, the self-aggrandizing poobah who was the “maestro of Big Data” left about mid way through Day Two. I am familiar with “experts” and azure chip consultants who have more pressing business than sticking with something to it completion. I was reminded of the behavior of the Costa Concordia captain. Was this disappearing act an indication of disorganization or craw fishing from failure? The steady attrition of paying attendees was evident by the third talk on the first day of the seminar. In fact, on Day One, Hour One, I counted 58 people in a room which was set to handle about 120. When the “maestro of Big Data” flew the coop, there were 15 people in the room. My talk, the end note for the event, pulled 30 people, up from the low of 15 at 2 pm on Day Two. Who introduced me? No one. Who stepped in to handle the last two hours of the event? My team, thank you.
Second, in my opinion, the majority of the speakers’ presentations were like most of the content on Slideshare, a business marketing service owned by LinkedIn, a job hunting service. (I think some of the speakers are denizens of LinkedIn, which I find quite amusing.) In my opinion, the Slideshare approach business information for many “members” is to take familiar, well-worn buzzwords. Then add a couple of trendy Hacker News references. Transfer recycled information to PowerPoint slides. And then serve up cold. I learned a great deal about SAP and how wonderful the company is for just about anything Big Data. I tuned out after the sixth or seventh worshipful reference. Although addled, I pick up on stuff once I hear the same old refrain three or four times. If you are interested in what was not covered in the seminar lectures, navigate to “What does O(log n) mean exactly?” I included one diagram with information about Big O, but marketers in my lecture lapsed into a coma when I mentioned the concept.
Third, I learned from several of the attendees that the Big Data sessions did not meet their expectations. I did deliver a lecture, and I had 30 people in the audience, not counting my two colleagues. I am not sure where these folks came from. We let them in the lecture hall because the organizer’s staff had wandered off to do more important tasks I assume. And, then — surprise, surprise — after my talk, five individuals clustered around me. Two of my colleagues witnessed the clump of groupies. I was hoping for major press coverage and maybe a Project Runway designer as fans. Instead, I got five — I am still in shock — mathematicians. The key comment witnessed by two experienced special librarians was, “You were the only speaker who told me how to think about Big Data problems. Very good.” I am not much of a thinker. I was not sure whether the adoring PhD from Rutgers was pulling my leg or speaking about the dismal quality of the other folks who were doing the Slideshare marketing thing. I was pleased with the feedback.
And, most importantly, I want to thank my two colleagues, Constance Ard, an honest to goodness, straight shooting law librarian, and Delores Meglio, once a New York Times’ executive who also worked with me at Ziff and who was was part of the senior management team at Elsevier Knovel, for:
- Stepping in when the “maestro” of the event disappeared because he had more pressing business than witnessing the sinking of his Titanic event. The event organizer’s staff apparently had to beat the commute rush home
- Facilitating the question and answer period which lasted a full 20 minutes after my lecture ended at 5 pm Eastern
- Chasing down an audio visual person to turn on the microphone and turn on the projection device. Apparently the show organizer’s team had better things to do that watch one speaker after another drive people from the room. I do not know if any paying customers were crying, but I would not rule anything out.
Will I reveal who organized this event? Nope.
Will I write a memo to the organizer, offering helpful suggestions to the organizer? Nope.
Will I point out which speakers scored a perfect 10 on the Slideshare airline food quality scale? Nope.
Why not?
I was thrilled to experience once again how people on my team deliver even when it is not their job.
Believe me when I say I was proud of Constance Ard’s and Delores Meglio’s spontaneous action. They made the last two hours of the Big Data event a success for the remaining attendees.
Did I tell Constance and Delores to step in? Nope.
Like others on my team of 30 people, Ms. Ard and Ms. Meglio are old-school professionals. Both believed that those in the Big Data lecture hall deserved 100 percent attention and effort. I was able to focus on making my talk the best it could be.
Could I have done a better job? Sure. Did I try my to do my best? Yes. I delivered despite the unprofessional setting in which I was placed.
Perhaps conference organizers and Big Data maestros could learn something about commitment and initiative by talking with people like Constance Ard and Delores Meglio, and ignoring the marketers who promise and, in my opinion, frequently fail to deliver?
Shift to another event commitment I had that same day, right after the Big Data lecture.l
In sharp contrast was the Startupalooza sponsored by iBreakfast event at John Jay College of Criminal Justice. I was asked to evaluate 15 start up ideas over a period of three hours. I want to point out that the Startupalooza event was organized, dynamic, professional, and exciting.
The reason?
The iBreakfast team and the 43 participants and five evaluators, were engaged. The innovators pitching start up ideas responded to the professionalism of the event and stepped up their game.
For me, the difference between the two events was as clear as choking down an MRE and chowing down at Le Bernadin.
Oh, what about my presentation and work at Startupalooze?
Those groupie mathematicians thought my talk was pretty good. What do math people know anyway? But I had three entrepreneurs clump around me after Startupalooza. Constance Ard extracted me because I continued to deliver at the 100 percent even though I was burning will power to keep going at 9 pm after my Big Data lecture.
My hope is that those younger than me try hard, do a professional job, and stick with commitments. My team performs in this manner. Will others follow Startupalooza’s, Ms. Ard’s, and Ms. Meglio’s example?
I sure hope so. Events succeed for many reasons. Professionalism is just one element and may, in some situations, be the catalyst for rising above mediocrity.
Stephen E Arnold, May 26, 2013
Sponsored by Augmentext
Aggressive Price Range Set for Tableau IPO
May 26, 2013
Tableau Software has its IPO scheduled for May 16 and has been spotted in the news quite a bit as of late. The Greencrest Capital Management’s most recent email campaign was on “Tableau Software (DATA).” A brief but informative summary accompanies a chart available for download as a free preview.
The actual document is nine pages and covers Tableau’s IPO, their innovation in data visualization, information on how they handle big data and it shows their model aiming for profitability. The preview tells us that Tableau Software has set an aggressive IPO price range.
We also learned:
“The company updated its S-1 on Monday, ahead of its IPO on May 16. Tableau plans to offer 7.2MM shares at a range between $23-26 per share, suggesting an implied market capitalization of $1.4B at the midpoint, or 3.6x our 2014 revenue estimate of $390MM. We have updated our valuation analysis to reflect market capitalization of Tableau and price/share – following the release of the exact offering size and shares outstanding.”
Want some insight into Tableau’s big play IPO? Well, it’s brief but there’s an interesting chart. Check it out here.
Megan Feil, May 26, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search