Set Data Free from PDF Tables

April 13, 2015

The PDF file is a wonderful thing. It takes up less space than alternatives, and everyone with a computer should be able to open one. However, it is not so easy to pull data from a table within a PDF document. Now, Computerworld informs us about a “Free Tool to Extract Data from PDFs: Tabula.” Created by journalists with assistance from organizations like Knight-Mozilla OpenNews, the New York Times and La Nación DATA, Tabula plucks data from tables within these files. Reporter Sharon Machlis writes:

“To use, download the software from the project website . It runs locally in your browser and requires a Java Runtime Environment compatible with Java 6 or 7. Import a PDF and then select the area of a table you want to turn into usable data. You’ll have the option of downloading as a comma- or tab-separated file as well as copying it to your clipboard.

“You’ll also be able to look at the data it captures before you save it, which I’d highly recommend. It can be easy to miss a column and especially a row when making a selection.”

See the write-up for a video of Tabula at work on a Windows system. A couple caveats: the tool will not work with scanned images. Also, the creators caution that, as of yet, Tabula  works best with simple table formats. Any developers who wish to get in on the project should navigate to its GitHub page here.

Cynthia Murrell, April 13, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Bing Predicts it Will Have Decent Results

April 13, 2015

Bing is considered a search engine joke, but it might be working its way as a viable search solution…maybe.  MakeUseOf notes, “How Bing Predicts Has Become So Good” due to Microsoft actually listening to its users and improving the search results with the idea that “Bing is for doing.”  One way Microsoft is putting its search engine to work is with Bing Predicts, a tool that predicts who win competitions, weather, and other information analyzed from popular searches, social media, regional trends, and more.

It takes a bit more for Predicts to divine sporting event outcomes, for those Bing relies on historic team data, key player data, opinions from top news sources, and pre-game report predictions.

Microsoft researcher, and serial predictor David Rothschild believes the prediction engine is ‘an interesting way to show users that Bing has a lot of horsepower beyond just providing good search results.’  Data is everything. Even regular Internet users understand the translation of data to power, so Microsoft’s bold step forward with their predictions underscores the confidence in their own algorithms, and their ability to handle the data coming into Redmond.”

Other than predicting games and the next American Idol winner, Bing Predicts has application for social fields and industry.  Companies are already implementing some forms of future analysis and for social causes it can be used to predict the best ways to conserve resources, medicinal supplies, food, and even conservatism.

Whitney Grace, April 13, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Useful Probability Lesson in Monte Carlo Simulations

April 6, 2015

It is no surprise that probability blogger Count Bayesie, also known as data scientist Will Kurt, likes to play with random data samples like those generated in Monte Carlo simulations. He lets us in on the fun in this useful summary, “6 Neat Tricks with Monte Carlo Simulations.” He begins:

“If there is one trick you should know about probability, it’s how to write a Monte Carlo simulation. If you can program, even just a little, you can write a Monte Carlo simulation. Most of my work is in either R or Python, these examples will all be in R since out-of-the-box R has more tools to run simulations. The basics of a Monte Carlo simulation are simply to model your problem, and then randomly simulate it until you get an answer. The best way to explain is to just run through a bunch of examples, so let’s go!”

And run through his six examples he does, starting with the ever-popular basic integration. Other tricks include approximating binomial distribution, approximating Pi, finding p-values, creating games of chance, and, of course, predicting the stock market. The examples include code snippets and graphs. Kurt encourages readers to go further:

“By now it should be clear that a few lines of R can create extremely good estimates to a whole host of problems in probability and statistics. There comes a point in problems involving probability where we are often left no other choice than to use a Monte Carlo simulation. This is just the beginning of the incredible things that can be done with some extraordinarily simple tools. It also turns out that Monte Carlo simulations are at the heart of many forms of Bayesian inference.”

See the write-up for the juicy details of the six examples. This fun and informative lesson is worth checking out.

Cynthia Murrell, April 6, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Microsoft Delve and PowerBi Make Data User Friendly

March 30, 2015

Microsoft Delve is a new part of the Office 365 package and it is similar to Facebook Graph Search or your Internet browsing history.  ChannelWorld posted “Microsoft Rolls Out Delve To Office 365, Previews PowerBi And Skype For Business.”  Microsoft will release Delve soon and it comes as demand for relationship-building tools grow in demand.  Delve tracks information from Office 365 applications such as Outlook, PowerPoint, Bing, Word, and more.  Microsoft is calling the collected data the Office Graph, showing how people interact with the software.

PowerBI is another rollout from Microsoft:

“Microsoft also announced that it has now rolled out the technical preview of PowerBI for Excel around the world, following its launch a year ago. PowerBI is designed to be a tool for non-techies to access technical data, quickly composing their own sales reports through natural-language queries against robust data sources–typing in a query like “what was our most popular product in Brazil last year?” should deliver a graph or even a map of those results. Incorporating Google Analytics, Microsoft Dynamics Marketing, Acumatica, Zuora and Twilio will come soon, Microsoft said.”

Microsoft will also incorporate Skype in Office 365.  Office 365’s is one of Microsoft’s most viable products and people have complained they have not done much with it in recent years.  Upgrades like Skype, Delve, and OfficeBI demonstrate that Microsoft is still invested in making Office 365 a competitive, usable, and reliable product.

Whitney Grace, March 30, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Partnership Between Twitter and IBM Showing Results

March 27, 2015

The article on TechWorld titled IBM Boosts BlueMix and Watson Analytics with Twitter Integration investigates the fruits of the partnership between IBM and Twitter, which began in 2014. IBM Bluemix now has Twitter available as one the services available in the cloud based developer environment. Watson Analytics will also be integrated with Twitter for the creation of visualizations. Developers will be able to grab data from Twitter for better insights into patterns and relationships.

“The Twitter data is available as part of that service so if I wanted to, for example, understand the relationship between a hashtag on pizza, burgers or tofu, I can go into the service, enter the hashtag and specify a date range,” said Rennie. “We [IBM] go out, gather information and essentially calculate what is the sentiment against those tags, what is the split by location, by gender, by retweets, and put it into a format whereby you can immediately do visualisation.”

From the beginning of the partnership, Twitter gave IBM access to its data and the go-ahead to use Twitter with the cloud based developer tools. Watson looks like a catch all for data, and the CMO of Brandwatch Will McInnes suggests that Twitter is only the beginning. The potential of data from social media is a vast and constantly rearranging field.

Chelsea Kerwin, March 27, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Glimpses of SharePoint 2016 on the Way

March 26, 2015

The tech world is excited for the upcoming SharePoint 2016 release. Curious parties will be glad to hear that sneak peaks will be coming this spring. Read more in the CMS Wire article, “Microsoft Leaks Offer a Glimpse of SharePoint 2016.”

The article lays out some of the details:

“Microsoft has started leaking news about SharePoint 2016 — and they suggest the company plans to showcase an early edition at Ignite, its upcoming all-in-one conference for everyone from senior decision makers, IT pros and “big thinkers” and to enterprise developers and architects. In a just released podcast, Bill Baer, senior product manager for SharePoint, said the company will offer a look at the latest version of SharePoint at the conference, which will be held in Chicago from May 4 through 8.”

Some experts have already weighed in with predictions for SharePoint 2016 features: hybrid search and improved user experience among them. Stephen E. Arnold will also be keeping an eye on the new version, reporting his findings on his dedicated SharePoint feed. He has devoted his career to all things search, including SharePoint, and keeps readers informed on his Web site ArnoldIT.com. Stay tuned for more updates on SharePoint 2016 as it becomes available.

Emily Rae Aldridge, March 26, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Digital Shadows Searches the Shadow Internet

March 23, 2015

The deep Web is not hidden from Internet users, but regular search engines like Google and Bing do not index it in their results.  Security Affairs reported on a new endeavor to search the deep Web in the article, “Digital Shadows Firm Develops A Search Engine For The Deep Web.”  Memex and Flashpoint are two search engine projects that are already able to scan the deep Web.  Digital Shadows, a British cyber security firm, is working on another search engine specially designed to search the Tor network.

The CEO of Digital Shadows Alistair Paterson describes the project as Google for Tor.  It was made for:

“Digital Shadows developed the deep Web search engine to offer its services to private firms to help them identifying cyber threats or any other illegal activity that could represent a threat.”

While private firms will need and want this software to detect illegal activities, law enforcement officials currently need deep Web search tools more than other fields.  They use it to track fraud, drug and sex trafficking, robberies, and tacking contraband.  Digital Shadows is creating a product that is part of a growing industry.  The company will not only make profit, but also help people at the same time.

Whitney Grace, March 23, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Data and Marketing Come Together for a Story

March 23, 2015

An article on the Marketing Experiments Blog titled Digital Analytics: How To Use Data To Tell Your Marketing Story explains the primacy of the story in the world of data. The conveyance of the story, the article claims, should be a collaboration between the marketer and the analyst, with both players working together to create an engaging and data-supported story. The article suggests breaking this story into several parts, similar to the plot points you might study in a creative writing class. Exposition, Rising Action, Climax, Denouement and Resolution. The article states,

“Nate [Silver] maintained throughout his speech that marketers need to be able to tell a story with data or it is useless. In order to use your data properly, you must know what the narrative should be…I see data reporting and interpretation as an art, very similar to storytelling. However, data analysts are too often siloed. We have to understand that no one writes in a bubble, and marketing teams should understand the value and perspective data can bring to a story.”

Silver, Founder and Editor in Chief of FiveThirtyEight.com is also quoted in the article from his talk at the Adobe Summit Digital Marketing Conference. He said, “Just because you can’t measure it, doesn’t mean it’s not important.” This is the back to the basics approach that companies need to consider.

Chelsea Kerwin, March 23, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Give Employees the Data they Need

March 19, 2015

A classic quandary: will it take longer to reinvent a certain proverbial wheel, or to find the documentation from the last time one of your colleagues reinvented it? That all depends on your organization’s search system. An article titled “Help Employees to ‘Upskill’ with Access to Information” at DataInformed makes the case for implementing a user-friendly, efficient data-management platform. Writer Diane Berry, not coincidentally a marketing executive at enterprise-search company Coveo, emphasizes that re-covering old ground can really sap workers’ time and patience, ultimately impacting customers. Employees simply must be able to quickly and easily access all company data relevant to the task at hand if they are to do their best work. Berry explains why this is still a problem:

“Why do organizations typically struggle with implementing these strategies? It revolves around two primary reasons. The first reason is that today’s heterogeneous IT infrastructures form an ‘ecosystem of record’ – a collection of newer, cloud-based software; older, legacy systems; and data sources that silo valuable data, knowledge, and expertise. Many organizations have tried, and failed, to centralize information in a ‘system of record,’ but IT simply cannot keep up with the need to integrate systems while also constantly moving and updating data. As a result, information remains disconnected, making it difficult and time consuming to find. Access to this knowledge often requires end-users to conduct separate searches within disconnected systems, often disrupting co-workers by asking where information may be found, and – even worse – moving forward without the knowledge necessary to make sound decisions or correctly solve the problem at hand.

“The second reason is more cultural than technological. Overcoming the second roadblock requires an organization to recognize the value of information and knowledge as a key organizational asset, which requires a cultural shift in the company.”

Fair enough; she makes a good case for a robust, centralized data-management solution. But what about that “upskill” business? Best I can tell, it seems the term is not about improving skills, but about supplying employees with resources they need to maximize their existing skills. The term was a little confusing to me, but I can see how it might be catchy. After all, marketing is the author’s forte.

Cynthia Murrell, March 19, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Looking Towards 2015’s Data Trends

March 5, 2015

Here we go again! Another brand new year and it is time to predict where data will take us. For the past few years it has been all about the big data and while it has a solid base, other parts of the data science are coming into the limelight. While LinkedIn is a social network for professionals, one can also read articles on career advice, hot topics, and new trends in fields. Kurt Cagle is a data science expert and has written on the topic for over ten years. His recent article, “Ten Trends In Data Science In 2015” from December was posted on LinkedIn.

He calls the four data science areas the Data Cycle: analysis, awareness, governance, and acquisition. From Cagle’s perspective, 2014 saw big data has matured, data visualization software is in high demand, and semantics is growing. He predicts 2015 will hold much of the same:

“…with the focus shifting more to the analytics and semantic side, and Hadoop (and Map/Reduce without Hadoop) becoming more mainstream. These trends benefit companies looking for a more comprehensive view of their information environment (both within and outside the company), and represent opportunities in the consulting space for talented analysts, programmers and architects.”

Data visualization is going to get even bigger in the coming year. Hybrid data stores with more capabilities will become more common, semantics will grow even larger and specializing companies will be bought up, and there will be more competition for Hadoop. Cable also predicts work be done on a universal query language and data analytics are moving beyond the standard SQL.

His ending observations explain that data silos will be phased into open data platforms, making technology easier not just for people to use but also for technology to be compliant with each other.

Whitney Grace, March 05, 2015
Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta