Going Beyond ETL
May 25, 2013
Traditional data warehousing, or as it is often called, Extract, Transform and Load (ETL) have constituted an important enterprise software category. Now, these capabilities are built into products solving other data needs. The article, “Talend Ships Version 5.3 of Data Integration Platform” talks Talend’s new platform.
Talend specializes in Data Integration and offers an open source distribution of its platform. Beyond ETL it features Master Data Management, Data Quality, Business Process Integration and Enterprise Service Bus.
The invariable question in this area is what about Hadoop?
“Talend version 5.3 now features a graphical mapper for building Apache Pig data transformation scripts visually (rather than having to code the data flows in the component’s language, “Pig Latin”), thus making an important Hadoop stack component a bit more analyst-friendly. Talend 5.3 can also generate native Java MapReduce code, which allows data transformations to run right on the Hadoop cluster, avoiding burdensome data movement, and making use of general purpose SQL and import/export tools like Hive and Sqoop unnecessary.”
The rest of the article mentions that Talend beefs up its connectors. They have added support for Couchbase, CouchDB, and Neo4j. Now they offer connectivity to databases across all four major NoSQL categories.
Megan Feil, May 25, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Guide on Onboarding Funnel and Jargon
May 24, 2013
A recent article from Woopra caught our attention: “How To Build And Optimize An Onboarding Funnel.” This post explains what onboarding funnels are and how to utilize them to their fullest capabilities. The onboarding funnel is one of the key analytics reports for any SaaS company. According to this article are only 3 main steps to building and optimizing: tracking milestones, identifying major drop offs and optimizing problem areas.
Using Pinterest as an example, the article explains tracking onboarding milestones: this details the processing from signing up to pinning a first item. Pinterest would be able to see their major drop off at the step where users should follow 5 boards.
As far as fixing the problem areas, the article suggests looking more granularly at the problem and identifying the cause:
“Sometimes even going so far as to split up one step into two or more can help you diagnose the cause of a problem. For example, if you notice many users begin filling out your signup form, but then abandon it, you may want to separate the different sections of the form into several pages in order to see which section is causing users trouble. You may very well find that it is the requirement to add credit card information to start a free trial that is causing users to abandon.”
This article speaks to a common problem but instead of breaking things down to be more simple, the author of this post overlays a framework and appears to be the analytics jargon prize winner.
Megan Feil, May 24, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Positive and Negative Future Implications of Crowdoptic
May 24, 2013
The attempt to mine for insights in big data is not a new concept. The Huffington Post confirms this as they describe one of the more interesting pushes in this area. We learned more about Crowdoptic in “Visual Data Mining from Crowdsourcing: From Augmented Reality to Augmented Security?”
On the ever-continuous hunt for elusive metadata-laden images and other files, Crowdoptic focuses on the majority rules idea. This technology filters through files to find ones where people are/were “crowding” to click photos. Additionally, they can pinpoint hotspots within that given location where people are physically focusing their cameras. Of course this can be done in real time.
The article discusses this technology’s potential for augmented marketing and advertising:
“The potential uses for this kind of technology in business and marketing are still to be explored fully. The technology basically identifies what is holding the attention of people at a place at a given time. It is basically like Twitter trending, but with images posted online. And if the company’s claims are anything to go by, if they have a target location and time, the technology is capable of mining online visual data and pinpointing events or places that many people focus on with their smartphone cameras (basically, what people are looking at) in a matter of seconds.”
Not only marketing is discussed, but also security — for purposes of justice but also excessive surveillance such as in Orwell’s “1984.” Keep these new controversial technologies coming in; this is better than Hollywood gossip.
Megan Feil, May 24, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Copy Machine to Grades Papers
May 24, 2013
Copy machines seem slightly outdated as they evoke images of futile technology a la Office Space. But Popular Science represents the antithesis of this and so does the new photocopier discussed in “New Software Teaches Photocopiers How To Grade Papers.” Automated grading machines for multiple choice exams have been around for decades but this takes it to a new level where handwritten answers can be graded by this new Xerox machine.
The software, called Ignite, would keep track of which students are doing poorly and on which questions. At a glance teachers will be able to see who’s struggling and with what concepts.
According to the article:
“The software, called Ignite, needs some pointers first. Teachers enter in the test and an answer key, which Ignite uses not only to figure out which answers are right but also to know where on the page to look for handwritten answers. Teachers also need to tell the software what concepts each question covers. Fourth-graders at one school in Rochester, New York, that has tested the software were impressed. Their teacher, Pat McDonald, named their machine Ziggy and told the Democrat and Chronicle that the kids have written poems about Ziggy.”
We thought IBM’s Watson was fascinating. This steals it’s thunder. The practical application and positive impact this could have on education is enormous.
Megan Feil, May 24, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Open Source Security Remains Corporate Concern
May 24, 2013
When it comes to enterprise information technology concerns, security is usually at the top of the list. Some say that using open source software leaves an organization more susceptible to security risks, while others argue just the opposite. This very debate continues in the Java World article, “Survey: Control and Security of Corporate Open Source Projects Proves Difficult.”
The article hones in a particular component of the security issue, whether or not an organization utilizes an open source policy. Results were compiled through a survey:
“When the 3,500 survey respondents were asked what are the biggest challenges in their company’s open-source policy, the main reasons listed were ‘no enforcement,’ ‘it slows down development’ and ‘we find out about problems too late in the process.’ When asked who in the organization has primary responsibility for open-source policy and governance, 36 percent ascribed that role to ‘application-development management,’ 14 percent to ‘IT operations,’ 16 percent to legal, 13 percent to an open-source committee or department, 7 percent to security, 7 percent to risk and compliance and 7 percent to ‘other.’”
So of the organizations that do utilize an open source policy, many acknowledge little enforcement paltry oversight. These concerns are real. However, an organization may benefit from a compromise, a value-added open source software option. A solution like LucidWorks is fully packaged and supported; not just free-roaming bits of code to be grabbed from the free web. Users and managers can feel more confident in LucidWorks because it is packaged in a way that is easier for them to understand. Most importantly, LucidWorks has long-term industry support and positive track record.
Emily Rae Aldridge, May 24, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Enterprise Search: Can Word Choice Rescue a Dogpaddling Business?
May 23, 2013
I read “Ontology Slays Data Integration and Ignites Semantic Search Revolution.” I found several things interesting about the write up.
First, there is the word choice: “slays,” “ignites,” and “revolution.” In case you have forgotten, an ontology is, according to the Catholic Encyclopedia:
Though the term is used in this literal meaning by Clauberg (1625-1665) (Opp., p. 281), its special application to the first department of metaphysics was made by Christian von Wolff (1679-1754) (Philos. nat., sec. 73). Prior to this time “the science of being” had retained the titles given it by its founder Aristotle: “first philosophy”, “theology”, “wisdom”. The term “metaphysics” (q.v.) was given a wider extension by Wolff, who divided “real philosophy” into general metaphysics, which he called ontology, and special, under which he included cosmology, psychology, and theodicy. This programme has been adopted with little variation by most Catholicphilosophers. The subject-matter of ontology is usually arranged thus:
- The objective concept of being in its widest range, as embracing the actual and potential, is first analyzed, the problems concerned with essence (nature) and existence, “act” and “potency” are discussed, and the primary principles — contradiction, identity, etc. — are shown to emerge from the concept of entity.
- The properties coextensive with being — unity, truth, and goodness, and their immediately associated concepts, order and beauty — are next explained.
- The fundamental divisions of being into the finite and the infinite, the contingent and the necessary, etc., and the subdivisions of the finite into the categories (q.v.) substance and its accidents (quantity, quality, etc.) follow in turn — the objective — reality of substance, the meaning of personality, the relation of accidents to substance being the most prominent topics.
- The concluding portion of ontology is usually devoted to the concept of cause and its primary divisions — efficient and final, material and formal –the objectivity and analytical character of the principle of causality receiving most attention.
My reaction? The use of the term ontology in the context of “slays,” “ignites,” and “revolution” seems a little frisky.
Second, the product referenced in the news release offers some relief. I find the explanation of the product in terms of what it is not quite interesting; to wit:
Ontology 4 is built to five key principles that separate it from traditional data integration technologies:
- No schema – Ontology uses a searchable, semantic model built on proven graph-based technology.
- No Integration – Ontology uses a semantic model to find and combine data relating to business entities fragmented across the enterprise.
- No Big Bang – Ontology’s semantic model embraces on-going changes while delivering value early and iteratively over the duration of a project.
- No Search Restriction – Ontology’s semantic search find’s information across application data, documents and emails.
- No Upfront Risk. – No integration to data sources, No unnecessary tying up of team resources, No feasibility surprises, and No problem changing project requirements.
“The Internet is the world’s largest source of data, yet no one integrates it. They search it,” concluded Enweani. “So, when it comes to enterprise data, we say ‘Search, don’t Integrate.”
Third, enterprise search and the vendors engaged in the discipline demonstrated at two enterprise search summits in the last two weeks a strong shift away from the use of the word “search.” Synonyms included customer relationship management, discovery, search based applications, and similar distancing terms.
Perhaps more colorful word choice and the use of old style rhetorical flourishes will breathe life into a dogpaddling business sector. As one vendor which recently experienced a CEO shuffle because the firm once again missed its numbers, “We are now a platform.”
Will word choice deliver revenue? Investors hope so.
Stephen E Arnold, May 23, 2013
Sponsored by Augmentext
Facial Recognition Technology Is A WIP
May 23, 2013
Watch any crime-solving show on TV and the forensics department has facial recognition technology that can take a blurry photo and make it as clear as pure water. Sadly, ARS Technica points out that facial recognition technology is more fantasy than truth: “Why Facial Recognition Tech Failed In The Boston Bombing Manhunt.” The article points out the faults in facial recognition, citing how the suspected Boston bombers’ photos were in a database but cameras around the area failed to pick them up. The technology can work, but it almost needs the right person at the right time:
“Under the best circumstances, facial recognition can be extremely accurate, returning the right person as a potential match more than 99 percent of the time with ideal conditions. But to get that level of accuracy almost always requires some skilled guidance from humans, plus some up-front work to get a good image.”
Improved graphic quality and cloud computing make the process more reliable and accurate, even deployable to mobile devices. Multiple mobile devices with cameras from different angles can actually cobble together an image, but more cameras are not a solution. The current systems are not complex enough to handle it, but the technology is well on its way. Facial recognition is more science-fiction than reality. It exists, but only in the beta phase.
Whitney Grace, May 23, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
The Negative Side Of Enterprise Software
May 23, 2013
You like, you hate it, you love it, you loathe it. These seem to be the common conceptions when it comes to enterprise software. Despite all the praise enterprise software has garnered, Glider takes a look at “Why Enterprise Software Sucks: 6 years Later,” a retrospect on an article from 2007.
Back in 2007, enterprise software’s biggest problem was the software buyers were not the end users. The buyers just needed to fulfill the requirements and a good user experience was optional. Fast forward to the present day, things are better…somewhat. Users are able to cut out the middleman and buy their own product as well as more user-friendly software. Companies are still facing slow adoption of the better product. Why? They are running off legacy systems and are afraid to touch them in case it should fail. Then there is the trust factor, companies hear about next technology, but are reluctant to try it. Once the crowd migrates over, so will everyone else.
Does enterprise software have a future? Yes, it does:
“The world at large is quickly growing accustomed to consumer internet (and mobile) applications. Everybody in the world is on Facebook. The average person has over 50 apps on their phone. It’s just a matter of time until they expect the same quality in the tools they use at work. The consumerization of enterprise will only grow stronger. The same can be said for bottom-up adoption.”
Enterprise is wanted, the mentality of the users just has to change to adopt it. If enterprise is “back,” are there lessons in this article for vendors of search, content processing and analytics systems aka the Big Data crowd? Or have they already learned from where enterprise software failed in the past?
Whitney Grace, May 23, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
LucidWorks Raises 10 Million in Capital
May 23, 2013
LucidWorks continues to raise revenue, helping the company build and support open source software that empowers organizations to manage their multi-structured data. Venture Beat covers this latest round of venture capital in their story, “LucidWorks Pulls in $10M to Turn Open Source Data Into ‘Business Gold.’”
The articles states:
“‘Big data’ startup LucidWorks has raised $10 million to help enterprise companies ‘turn multistructured data into business gold’ . . . According to a form filed with the SEC, existing investors Shasta Ventures, Granite Ventures, and Walden International contributed to this third round of funding. It brings LucidWorks’ total capital raised to $26 million.”
The company employs one-fourth of the committers on the Apache Lucene/Solr project, upon which their LucidWorks Search and LucidWorks Big Data offerings are built. Big customers include AT&T, Elsevier, Cisco, Nike, Sears, and Ford, among others. The company is truly doing well, and this additional capital will help improve their scope and reach. Their support offerings set them apart from the pack, and their investment in open source is sincere, sponsoring multiple training and development events across the country. If they stay on this path, good things will continue to happen to LucidWorks.
Emily Rae Aldridge, May 23, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Phone Data Value And What Companies Are Doing With It
May 23, 2013
Smartphones are an extension of a person’s life and they record it every time a person uses it. Smithsonian Magazine takes a look at how phone companies are tracking and using the data from phones in, “What Phone Companies Are Doing With All That Data From Your Phone.” Verizon Wireless is aware of the phone data goldmine and has added a new division called Precision Market Insights and Telefonica is adding a new business unit Telefonica Dynamic Insights to do the same thing. Phone data is being used for market, medical, and social science research. The biggest usage is tracking how people move in real time. The data collected is supposed to remain anonymous, but that is not happening.
People can be tracked:
“But a study published in Scientific Reports in March found that even data made anonymous may not be so anonymous after all. A team of researchers from Louvain University in Belgium, Harvard and M.I.T. found that by using data from 15 months of phone use by 1.5 million people, together with a similar dataset from Foursquare, they could identify about 95 percent of the cell phones users with just four data points and 50 percent of them with just two data points. A data point is an individual’s approximate whereabouts at the approximate time they’re using their cell phone.”
People’s travel and cell phone patterns are repetitive and unique, making it easy to narrow down results to an individual user. Anonymity is a hard thing to achieve with a smartphone. To confuse the data, a person could get two mobile phones, but then does that increase the fun or increase the risk?
Whitney Grace, May 23, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search