Inetsoft Incorporates Data Source Connector for Hadoop

February 20, 2014

Style Intelligence, the Business Intelligence platform from Inetsoft, released v. 11.5 late in 2013. According to the Passionned Group post “New Release of Inetsoft Includes Hadoop Connector,” the new data source connector will

“enable users to add data from the Apache Hive data warehouse to their dashboards and analysis in the same easy way they connect any other data source.”

The update also features a complete restyle of their user interface, which is based on a self-service dashboard for data reporting. Style Intelligence is built on an open standard SOA architecture and delivers data reports from both structured and unstructured sources.

In other words, Inetsoft has seen that Hadoop is the way Big Data is going, and they want to make sure their own product can work with what is fast becoming the industry standard.

Laura Abrahamsen, February 20, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Big data, Data, News | 17 Comments

Thomson Reuters Acquires Entagen, Builds Cortellis Data Fusion Technology

February 19, 2014

The press release on ThomsonReuters.com titled Thomson Reuters Cortellis Data Fusion Addresses Big Data Challenges by Speeding Access to Critical Pharmaceutical Content announces the embrace of big data by revenue hungry Thomson Reuters. The new addition the suite of drug development technologies will offer users a more intuitive interface through which they will be able to analyze large volumes of data. Chris Bouton, General Manager at Thomson Reuters Life Sciences is quoted in the article,

“Cortellis Data Fusion gives us the ability to tie together information about entities like diseases and genes and the connections between them. We can do this from Cortellis data, from third party data, from a client’s internal data or from all of these at the same time. Our analytics enable the client to then explore these connections and identify unexpected associations, leading to new discoveries… driving novel insights for our clients is at the very core of our mission…”

The changes at Thomson Reuters are the result of the company’s acquisition of Entagen, according to the article. That company is a leader in the field of semantic search and has been working with biotech and pharmaceutical companies offering both development services and navigation software. Cortellis Data Fusion promises greater insights and better control over the data Thomson Reuters holds, while maintaining enterprise information security to keep the data safe.

Chelsea Kerwin, February 19, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Acquisition, Big data, Data, News | Comments Off on Thomson Reuters Acquires Entagen, Builds Cortellis Data Fusion Technology

Marvel Introduced by Elasticsearch to Monitor and Manage Data Extraction

February 17, 2014

The article titled Elasticsearch Debuts Marvel To Deploy And Monitor Its Open Source Search And Data Analytics Technology on TechCrunch provides insight into Marvel, which the article calls a “deployment management and monitoring solution.” Elasticsearch is a technology for extracting information from structured and unstructured data and its users include such big names as Netflix, Verizon and Facebook among others. The article explains how Marvel will work to manage Elasticsearch,

“Enter Marvel, Elasticsearch’s first commercial offering, that makes it easy to run search, monitor performance, get visual views in real time and take action to fix things and improve performance. Marvel allows Elasticsearch system operators, who manage the technology at companies like Foursquare, see their Elasticsearch deployments in action, initiate instant checkup, and access historical data in context. Potential systems issues can be spotted and resolved before they become problems, and troubleshooting is faster. Pricing starts at $500 per five nodes.”

Elasticsearch reported that their revenue growth in 2013 was at over 400% and Marvel will only further their popularity. Already a user-friendly and lightweight technology, Elasticsearch is targeting developers interested in real-time discernibility of their data. Marvel may be great news for Elasticsearch and its users, but is certainly bad news for competitor Lucid Imagination.

Chelsea Kerwin, February 17, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Analytics, Data, News | 2 Comments

Advice on Making the Most of Limited Data

February 12, 2014

The article How To Do Predictive Analytics with Limited Data from Datameer on Slideshare suggests that Limited Data may replace Big Data in import. The idea of “semi-supervised learning” is presented to handle the difficulties associated with creating predictions based on limited data such as expense and manageability and simply missing key data. The overview states,

“As it turns out, recent research on machine learning techniques has found a way to deal effectively with such situations with a technique called semi-supervised learning. These techniques are often able to leverage the vast amount of related, but unlabeled data to generate accurate models. In this talk, we will give an overview of the most common techniques including co-training regularization. We first explain the principles and underlying assumptions of semi-supervised learning and then show how to implement such methods with Hadoop.”

The presentation summarizes possible approaches to semi-supervised learning and the assumptions it is possible to make about unlabeled data (these include such models as clustering, low density and manifold assumptions). It also covers the concepts of Label Propagation and Nearest Neighbor Join. However, as inviting as it is to forget Big Data, and switch to predictive analytics with Limited Data the suggestion may sound too much like Bayes-Laplace.

Chelsea Kerwin, February 12, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Analytics, Data, News | 1 Comment

Attivio and Quant5 Partner to Meet Challenges of Data Analytics

February 11, 2014

The article on PRNewswire titled Attivio and Quant5 Partner to Bring Fast and Reliable Predictive Customer Analytics to the Cloud explains the partnership between the two analytics innovators. Aimed at producing information from data without the hassle of a team of data scientists, the partnership promises to effectively create insights that companies will be able to act on. The partnership responds to the growing frustration some companies face with gleaning useful information from huge amounts of data. The article explains,

“Attivio built its business around the core principle that integrating big data and big content should not require expensive mainframe legacy systems, handcuffing service agreements, years of integration and expensive data scientists. Attivio enterprise customers experience business-changing efficiency, sales and competitive results within 90 days. Similarly, Quant5 arose from the understanding that businesses need simple, elegant solutions to address difficult and complex marketing challenges. Quant5 customers experience increased revenues, reduced customer churn and an affordable and fast path to predictive analytics.”

The possibility of indirect sales following in the footsteps of Autonomy and Endeca does seem to be a part of the 2014 tactics. The Attivio–Quant5, Inc. solutions are offered in five major areas of concern: Lead & Opportunity Scoring, Customer Segmentation, Targeted Offers, Product Usage and Product Relationships.

Chelsea Kerwin, February 11, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Analytics, Data, News | Comments Off on Attivio and Quant5 Partner to Meet Challenges of Data Analytics

Cloud Host Digital Ocean Reverses Imprudent Scrubbing Policy

January 31, 2014

Sometimes a company can grow too fast for its own good. Take the case of DigitalOcean, which eWeek describes in its piece, “Scrubbing Data a Concern in the Digital Ocean Cloud.” It was recently discovered that the cloud hosting firm was not automatically scrubbing user data after every deletion of a virtual machine (VM) instance—not good for security. Apparently, the young company once scrubbed after each VM destroy request, but changed that policy as their growth ballooned.

Writer Sean Michael Kerner tells us:

“As Digital Ocean’s utilization went up, the company found that the scrubbing activity was degrading performance and decided to make it an option that API users needed to manually activate. [DigitalOcean CEO Moisey] Uretsky told eWEEK that even though the data scrubbing has an impact, it is now a cost that his company will bear.

Digital Ocean grew very quickly in 2013, to at least 7,000 Web-facing servers in June 2013, up from only 100 in December 2012, according to Netcraft. One of the reasons for the rapid rise has been Digital Ocean’s aggressive pricing, which starts at $5 for a server with 512MB of memory and a 20GB solid-state drive for a month of cloud service.”

At least the company is taking responsibility for, and learning from, the mistake. Not only is DigitalOcean now faithfully scrubbing every deleted VM instance in sight, Uretsky also specified that his company is hastening to make other changes based on customer feedback. They also, he noted, pledge not to reveal customer data to third parties. The imprudent scrub-optional policy only affected certain DigitalOcean API users, and it does not appear from the article that any programmers were harmed. Headquartered in New York City, DigitalOcean graduated from the TechStars startup accelerator program in 2012.

Cynthia Murrell, January 31, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Cloud computing, Data, News | 2 Comments

Valuable Primer on Data Logs

January 24, 2014

Who knew LinkedIn could be so useful? The site’s Engineering blog supplies an thorough look at logs in, “The Log: What Every Software Engineer Should Know About Real-Time Data’s Unifying Abstraction.” Writer and LinkedIn Engineer Jay Kreps aims to fill what he sees as a large gap in the education of most software engineers. The site’s transition last year from a centralized database to a distributed, Hadoop-based system opened his eyes.

Kreps writes:

“One of the most useful things I learned in all this was that many of the things we were building had a very simple concept at their heart: the log. Sometimes called write-ahead logs or commit logs or transaction logs, logs have been around almost as long as computers and are at the heart of many distributed data systems and real-time application architectures. You can’t fully understand databases, NoSQL stores, key value stores, replication, paxos, hadoop, version control, or almost any software system without understanding logs; and yet, most software engineers are not familiar with them. I’d like to change that. In this post, I’ll walk you through everything you need to know about logs, including what is log and how to use logs for data integration, real time processing, and system building.”

He isn’t kidding. The extensive article is really a mini-course that any programmer who hasn’t already mastered logs should look into. Part one is, titled “What is a log?”, covers logs in general as well as their place in both databases and distributed systems. Part two discusses data integration, including potential complications, the relationship to a data warehouse, log files, and building a scalable log. Real-time stream processing is discussed in part three, as well as data flow graphs, real-time processing, and log compaction. Part four covers system building, delving into the prospect of unbundling and where logs fits into system architecture. At the end, Kreps supplies an extensive list of resources for further study.

Cynthia Murrell, January 24, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Data, Database, News | 1 Comment

Visual Mining Redesign

January 18, 2014

We are familiar with Visual Mining and its range of dashboard and data visualization software. Currently, Visual Mining has been working on products that help users better understand and analyze actionable business data. Its enterprise software line NetCharts is compatible across all platforms, including mobile and tablets. The company recently released their Winter 2013 Chartline Newsletter.

Along with the usual end of the year greetings and gratitudes, the first note of business in the newsletter addresses is the Web site’s redesign.

Among the new features are:

“Live Demo We would like to invite you to take a virtual test drive of our live NetCharts Performance Dashboards (NCPD) demo to see our newly restyled dashboard KPI’s.
Blog Among the new items to explore on our site includes our new blog. This developer driven blog features new content with many different topics including tips and simple tricks to help you build and style your charts and dashboards. Keep coming back for lots more new content that will be added each month.
Chart Gallery We also have a new chart gallery, which features all new examples with many different kinds of chart types to demonstrate some of the countless possibilities. We also added new chart type categories such as Alerting Charts and Showcase Charts. The Alerting Charts include different chart types that use alert zones while the Showcase category features chart examples with new and unusual styling approaches to demonstrate the flexibility of our charts.”

We have to wonder if the redesign came from the lack of Web traffic. Most Web sites are losing traffic, among them are content processing vendors. Does Visual Mining hope to generate sales more traffic based on their new look? We hope so.

Whitney Grace, January 18, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Data, Data mining, News, visualization | Comments Off on Visual Mining Redesign

Data Broker Offered Sensitive Lists for Sale

January 14, 2014

Now this is downright creepy. The Wall Street Journal’s tech site Digits notifies us that “Data Broker Removes Rape-Victims List After Journal Inquiry.” As the headline states, the list has now been removed, but yikes! Medbase200 offered this tragic roster for sale, along with ones listing victims of domestic violence, HIV/AIDS patients, and “peer pressure sufferers”, until an inquiry from the Wall Street Journal prompted them to remove them all. This looks like a very large hole in our HIPPA protections.

Writer Elizabeth Dwoskin reports:

“The rape-victims list was first disclosed by Pam Dixon, executive director of the World Privacy Forum, at a Senate hearing Wednesday about the data-broker industry. Ms. Dixon could not be reached for comment after her testimony.

The hearing was part of a Senate Commerce Committee investigation into the data-broker industry. In a report Wednesday, the committee said marketers maintain databases that purport to track and sell the names of people who have diabetes, depression, and osteoporosis, as well as how often women visit a gynecologist. The report said individuals don’t have a right to know what types of data the companies collect, how people are placed in categories, or who buys the information.

Medbase200, a unit of Integrated Business Services Inc., sells lists of health-care providers and of people purportedly suffering from ailments such as diabetes and arthritis to pharmaceutical companies.”

I will leave alone for now the whole issue of who owns an individual’s health data, because that is a rant for another day. Sam Tartamella, president of the parent company here, seems to have been unaware of what Madebase200 was up to; he denied the list’s existence until presented by the Journal with a link to the division’s “rape sufferers” page.

Why, in the name of all that is holy, did the company offer these mailing lists for sale? Apparently, the cash they could make by vending the vulnerable trumped any sense of human decency. At least the target market was not predatory individuals (though is it a stretch to think such creatures could gain access?) Rather, it was pharmaceutical companies who could drop just $79 and get information on 1,000 folks who had been through a specific hardship. I can only imagine, but I think if I were in any of those categories, every instance of targeted marketing would be like a kick to the gut. Not to mention the distressing questions; how did they know? who else knows? These unanswered questions could haunt someone for years, since “individuals don’t have a right to know” what these companies have done with our most personal information.

Welcome to the dark side of the data-driven society.

Cynthia Murrell, January 14, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Data, News | 3 Comments

DCL Tapped for Library of Congress Digitization Project

January 14, 2014

The U.S. Library of Congress has enlisted the help of conversion-services firm Data Conversion Laboratory (DLC), we learn from “Library of Congress Signs Deal for Digital Content Services” at GCN. The firm will help implement standards for content in both the Library of Congress and the U.S. Copyright Office.

GCN editor Paul McCloskey tells us:

“The Copyright Office wants to set up a small number of standard formats, for itself and other institutions to preserve, ‘expand and maintain its collections as more and more journals are being published solely digital formats,’ DCL said. Since 2010, the U.S. Copyright Office has started to issue mandatory deposit requirements for files and metadata associated with electronic periodicals that are published online only and are to be added to the Library of Congress collection. DCL says it has met all of the Library’s specs in carrying the publishing mandates out, including having expertise with the PubMed Central Journal Article Tag Suite (JATS) specification for institutional repositories.”

Founded in 1981, Data Conversion Laboratory is a veteran in the digitization field. They pledge they can convert complex content from any format to any format, while offering related services like editorial support and conversion-project management. DCL is located in Fresh Meadows, New York.

Cynthia Murrell, January 14, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Data, Library automation, News | Comments Off on DCL Tapped for Library of Congress Digitization Project

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Inetsoft Incorporates Data Source Connector for Hadoop

Thomson Reuters Acquires Entagen, Builds Cortellis Data Fusion Technology

Marvel Introduced by Elasticsearch to Monitor and Manage Data Extraction

Advice on Making the Most of Limited Data

Attivio and Quant5 Partner to Meet Challenges of Data Analytics

Cloud Host Digital Ocean Reverses Imprudent Scrubbing Policy

Valuable Primer on Data Logs

Visual Mining Redesign

Data Broker Offered Sensitive Lists for Sale

DCL Tapped for Library of Congress Digitization Project

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta