November 11, 2013
The article titled The Rise and Rise of Palantir and Its Deep Domain Knowledge on Crikey follows the move of Palantir Technologies, a datamining company with a 2 million dollar investment from the CIA, to Canberra, Australia. Palantir has seen its fair share of press, good and bad, but ever since Anonymous hacked their system and discovered their plan to destroy WikiLeaks’ credibility in 2011, the adjective “ruthless” seems appropriate. The company, founded in 2002, moved to Australia in 2011 and has seen enormous success. The article explains,
“The Department of Defence began using some of its software in 2011 via third-party providers, but this year has seen the company grow rapidly… Top-flight lobbying firm Government Relations Australia was hired to represent them in Canberra and state capitals. In the last few weeks, the company has secured multi-year contracts with the Department of Defence’s Intelligence and Security branch worth nearly $2 million, all secured via limited tender…Those of course are the contracts we know about.”
The article speculates that Palantir is being utilized by the Australian government given the proven effectiveness of datamining for national security. While the ACLU believes they pose a massive threat to the privacy of civilians, governments continue to invest in cybersecurity companies.
Chelsea Kerwin, November 11, 2013
October 6, 2013
What is visual data mining? I know that data mining involves searching through data with a computer program in search of specific information. I am guessing that visual data mining includes the same aspect except it presents the data using various patterns. Am I right? Am I dead wrong? I do not know, but I do know the way to find the answer is to read Visual Data Mining-Theoyr by Arturas Mazeika, Michael H. Bohlen, and Simeoin Simoff.
Here is the item description from Amazon:
“The importance of visual data mining, as a strong sub-discipline of data mining, had already been recognized in the beginning of the decade. In 2005 a panel of renowned individuals met to address the shortcomings and drawbacks of the current state of visual information processing. The need for a systematic and methodological development of visual analytics was detected. This book aims at addressing this need. Through a collection of 21 contributions selected from more than 46 submissions, it offers a systematic presentation of the state of the art in the field. The volume is structured in three parts on theory and methodologies, techniques, and tools and applications.”
This book usually retails for a whooping $99.00 or $63.91 with the Amazon discount. It is still a hefty chunk of change for a 163 page book, which is why we are pleased to say if you are a member of ISBN Book Funder or OnlineBooks.com then it is available to you for free. Other books are free for members. If that does not appeal to you check our your local academic library.
Whitney Grace, October 06, 2013
September 1, 2013
The Justice League’s headquarters, either the Hall of Justice or the Watch Tower, has state of the art equipment to track bad guys and their criminal activities. We puny mortals might actually have a tool to put Batman’s own deductive skills to shame with big data, says The News Factor in the article, “Watch Out, Terrorists: Big Data Is On The Case.” Big data is nothing new, we just finally have the technology to aggregate the data and follow patterns using data mining and data visualization.
The Institute for the Study of Violent Groups is searching through ten years of data about suspected groups and individuals involved with terrorism and other crimes. The Institute is discovering patterns and information that was never possible before. Microsoft’s security researchers are up to their eyeballs in data on a daily basis that they analyze for cyber attacks. Microsoft recently allocated more resources to develop better network analytical tools.
The article says that while these organizations’ efforts are praiseworthy, the only way to truly slow cyber crime is to place a filter over the entire Internet. Here comes the company plug:
“That’s where new data-visualization technology, from vendors such as Tableau and Tibco Software, hold potential for making a big difference over time. These tools enable rank-and-file employees to creatively correlate information and assist in spotting, and stopping, cybercriminals.”
Big data’s superpowers are limited to isolated areas and where it has been deployed. Its major weakness is the entire Internet. Again, not the end all answer.
Whitney Grace, September 01, 2013
August 16, 2013
The hoohah about cloud computing, Big Data, and other “innovations” continues. Who needs Oracle when one has Hadoop? Why license SPSS or some other Fancy Dan analytics system when there are open choice analytics systems a mouse click away? Search? Lots of open source choices.
We have entered the Gilded Age of information and data analysis. Do I have that right?
The marketers and young MBAs chasing venture funding instead of building revenue shout, “Yes, break out the top hats and cigars. We are riding a hockey stick type curve.”
Well, sort of. I read “Business Intelligence, Tackling Legacy Systems Top Priorities for CIOs.” Behind the consultant speak and fluff, there lurk two main points:
- Professionals in the US government and I presume elsewhere are struggling to make sense of “legacy” data; that is, information stuffed in file cabinets or sitting in an antiquated system down the hall
- The problems information technology managers remain unresolved. After decades of effort by whiz kids, few organizations can provide basic information technology services.
As one Reddit thread made clear, most information technology professionals use Google to find a fix or read the manual. See Reddit and search for “secrets about work business”.
A useful comment about the inability to tap data appears in “Improving business intelligence and analytics the top tech priority, say Government CIOs.” Here’s the statement:
IT contracts expert Iain Monaghan of Pinsent Masons added: “Most suppliers want to sell new technology because this is likely to be where most of their profit will come from in future. However, they will have heavily invested in older technology and it will usually be cheaper for them to supply services using those products. Buyers need to balance the cost they are prepared to pay for IT with the benefits that new technology can deliver,” he said. “Suppliers are less resistant to renegotiating existing contracts if buyers can show that there is a reason for change and that the change offers a new business opportunity to the supplier. This is why constant engagement with suppliers is important. The contract is meant to embody a relationship with the supplier.”
Let me step back, way back. Last year my team and I prepared a report to tackle this question, “Why is there little or no progress in information access and content processing?”
We waded through the consultant chopped liver, the marketing baloney, and the mindless prose of thought leaders. Our finding was really simple. In fact, it was so basic we were uncertain about a way to present it without coming across like a stand up comedian at the Laugh House. To wit:
Computational capabilities are improving but the volume of content to be processed is growing rapidly. Software which could cope with basic indexing and statistical chores bottlenecks in widely used systems. As a result, the gap between what infrastructure and software can process and the amount of data to be imported, normalized, analyzed, and output is growing. Despite recent advances, most organizations are unable to keep pace with new content and changes to current content. Legacy content is in most cases not processed. Costs, time, and tools seem to be an intractable problem.
Flash forward to the problem of legacy information. Why not “sample” the data and use that? Sounds good. The problem is that even sampling is fraught with problems. Most introductory statistics courses explain the pitfalls of flawed sampling.
How prevalent is use of flawed sampling? Some interesting examples from “everywhere” appear on the American Association for Public Opinion Research. For me, I just need to reflect on the meetings in which I have participated in the last week or two, Examples:
- Zero revenue because no one matched the “product” to what the prospects wanted to buy
- Bad hires because no one double checked references. The excuse was, “Too busy” and “the system was down.”
- Client did not pay because “contracts person could not find a key document.”
Legacy data? Another problem of flawed business and technology practices. Will azure chip consultants and “motivated” MBAs solve the problem? Nah.Will flashy smart software be licensed and deployed? Absolutely. Will the list of challenges be narrowed in 2014? Good question.
Stephen E Arnold, August 16, 2013
Sponsored by Xenky
August 8, 2013
How much are we revealing of ourselves online? Every day we are hearing new information about how even the safest internet users are most likely wide open to spying. It’s hard to say what NSA whistleblower Edward Snowden thought would happen, but the world’s reaction is probably pretty close. The NSA isn’t the only one peeking, as we learned in a recent TIME article, “This MIT Website Tracks Your Digital Footprint.”
According to the article about a program called Immersion:
Much like the government phone-surveillance programs, Immersion doesn’t need to access the content of communications. Instead, by gathering information about the senders and recipients of all the e-mails in an inbox, it can create a detailed portrait of the user’s social connections. Each person’s picture on Immersion is as unique as a fingerprint, but much more informative.
While, sure, we treasure our privacy as much as anyone, this news and the NSA fiasco isn’t really much of a bubble on our radar. Much like the President said it’s not really a huge deal. If anyone who spends a lot of time online thinks they have anything resembling privacy, we have a bridge we’d love to sell them.
Patrick Roland, August 08, 2013
July 30, 2013
The medical field is always evolving with new advances. The same can be said about the medical technology field, especially in mobile data analytics. Today’s hot trend is a relic faster than in almost any field, so we try hard to keep tabs, such as an illuminating article in CMS Wire, “Temis Acquires i3 Analytics to Boost Text + Data Mining.”
According to the story:
“While we don’t know how much Temis paid out in this deal, we know doctor’s love iPads. This tells us pretty much all we need to know about this deal. i3 Analytics specializes in what it calls biopharma, what most of us know as pharmaceutical research or biotechnology.”
Advances in biotech and biopharma mean more data for doctors and drug companies to rummage through, something a company like i3 Analytics is more than happy to help them with.
This is an interesting story of healthcare analytics. Frankly, nothing surprises us anymore. Heck, we recently heard that Kansas City is the new boomtown for healthcare analytics. We think if things like this are possible, there’s no way this dynamic industry will stop changing anytime soon.
Patrick Roland, July 30, 2013
May 28, 2013
One of the main areas that companies are failing to collect data on is mobile phones. Interestingly enough, Technology Review has this article to offer the informed reader: “Released: A Trove Of Cell Of Cell Phone Data-Mining Research.” Cell phone data offers a plethora of opportunity, one that is only starting to be used to its full potential. It is not just the more developed countries that can use the data, but developing countries as well could benefit. It has been noted that cell phones could be used to redesign transportation networks and even create some eye-opening situations in epidemiology.
There is a global wide endeavor to understand cell phone data ramifications:
“Ahead of a conference on the topic that starts Wednesday at MIT, a mother lode of research has been made public about how to use this data. For the past year, researchers around the world responded to a challenge dubbed Data for Development, in which the telecom giant Orange released 2.5 billion records from five million cell-phone users in Ivory Coast. A compendium of this work is the D4D book, holding all 850 pages of the submissions. The larger conference, called NetMob (now in its third year), also features papers based on cell phone data from other regions, described in this book of abstracts.”
Before you get too excited, take note that privacy concerns are an important issue. No one has found a reasonable way to disassociate users with their cell phone data. It will only be a matter of time before that happens, until then we can abound in the possibilities.
Whitney Grace, May 28, 2013
May 22, 2013
After the data is extracted much can be done with it:
“Extracted results can be saved to csv, Excel(xls), SQLite, Access, SQL Server, MySQL, PostgreSQL, and can specify the database fields’ types and attributes(eg, UNIQUE can avoid duplication of the extracted data). According to the setting, program can build, rebuild or load the database structure, and save the data to an existing database. Professional edition support incremental extraction, clear extraction and schedule extraction.”
FMiner Pro is available for a free fifteen-day trial to see how well it can perform. After viewing the specs, FMiner Pro is worth a shot. It can probably save coders hours by not having to write scripts and organizing Web content is a tedious job no one likes to do. Having a program to do it is much more preferable.
Whitney Grace, May 22, 2013
March 27, 2013
For a simple explanation of content enrichment, there is Web CMS Content Enrichment with OpenCalais, Crafter Rivet and Alfresco, on Rivet Logic Blogs. Content enrichment, the art of mining data and adding value to it, has now been organized by such services as OpenCalais, a free resource of semantic data mining from Thomson Reuters. For use on your blog, website or application, OpenCalais’s mission is to make “the worlds content more accessible.” The article explains,
“A few examples of content enrichment include: entity extraction, topic detection, SEO (Search Engine Optimization,) and sentiment analysis. Entity extraction is the process of identifying unique entities like people and places and tagging the content with it. Topic detection looks at the content and determines to some probabilistic measure what the content is about. SEO enrichment will look at the content and suggest edits and keywords that will boost the content’s search engine performance. Sentiment analysis can determine the tone or polarity (negative or positive) of the content.”
The tutorial on using OpenCalais with Crafter Rivet’s operating platform offered in this article is short and straightforward. Without tools like OpenCalais, the huge advantages of content enrichment for author and content managers would take countless hours. The resources available can save time while improving the effectiveness of content.
Chelsea Kerwin, March 27, 2013
January 25, 2013
Even though many companies started researching big data initiatives for their organization, they did not actively pursue the technologies or the workforce needed to turn their data into gold. Experts in the field are opining in their predictions that 2013 will be the year that big data really hits and companies utilizing it will have a competitive advantage against others who are behind the curve. GigaOM reports on an opportunity for professionals interested in big data in the brief write-up, “Meet Big Data Bigwigs at Structure: Data.”
The opportunity for networking and learning from industry experts will be from March 20-12 and is called Structure:Data. The article tells us more:
‘Whether we know it or not, data — big, small or otherwise — is becoming a central component to the way we live our lives,’ says GigaOM writer Derrick Harris in his big data predictions for 2013. At Structure:Data we’ll delve into what lies ahead for big data, as we explore the technical and business opportunities that the growth of big data has created. Topics include case studies of big data implementations, the future of Hadoop, machine learning, the looming data-scientist crisis and the top trends in big data technologies.
There are plenty of insights and opportunities to be mined from big data and there are some firms that are already tapping into big data. Tools like PolySpot make this an easy feat for small businesses to large corporations with their scalable solutions to disseminate insights from terabytes in real time across the enterprise.
Megan Feil, January 25, 2012
Sponsored by ArnoldIT.com, developer of Beyond Search