August 1, 2015
Want to shake free of the proprietary search and retrieval systems? I don’t blame you. Irregular and slow bug fixes and licensing handcuffs are two good reasons. Remember: The cost of search is not the licensing fee. The cost is a collection of fees, purchases, and expenses which every search system with which I am familiar is burdened.
Elasticsearch is the go to solution at this time in my opinion. If you want a useful overview of Elasticsearch, check out the Slideshare presentation “Introduction to ElasticSearch.” You may have to “join” LinkedIn / Slideshare to do anything useful, however.
The deck was prepared / delivered in the spring of 2015 by Roy Russo who is affiliated with or is “DevNexus.” The information is jargon free, an approach which the whiz kids at LucidWorks (Really?) may want to imitate. The presentation does contain a couple of buzzwords like NGram, but no MBA speak.
Stephen E Arnold, August 1, 2015
August 1, 2015
Many years ago I loaded a software application from Autonomy. The application watched what I was “doing” and automatically displayed search results sort of relevant to what the software thought I was writing.
Flash forward to now. I read “Mike Lynch’s Cyber security Startup Darktrace Valued at More than £60m.” The point of the write up is that Dr. Mike Lynch has what looks like another success in his digital Bialette k6857 Mocha Express machine.
Darktrace monitors digital flows for signals. Instead of displaying search results, the system alerts security officers of a probable issue. Maybe Kinjin is not the influencer of the system. No matter. The company is “valued at more than $100 million.”
- The Hewlett Packard Autonomy hassle has not spoiled Dr. Lynch’s coffee
- Dr. Lynch is once again moving into a market sector in which some of the competitors are likely to be unaware of Dr. Lynch’s electric powered kitchen appliance taking over their coffee machine.
- Hewlett Packard may want to ask and answer: “Why did we lose this fellow?”
My hunch is that HP won’t ask the question and may not admit that the answer is not just technology. The murky world of management spoils and otherwise pristine cup of java. That’s a $100 million dollar cup of joe.
Stephen E Arnold, August 1, 2015
July 30, 2015
I am fascinated with the cheerleading about open source software which makes Big Data as easy as driving a Fiat 500 through a car wash. (Make sure the wheels fit inside the automated pulley system, of course.)
Navigate to “The Big Big Data Question: Hadoop or Spark?” Be prepared to read about two—count ‘em—two systems working as smoothly as the engine in a technical high school’s auto repair class’ project car.
I want to highlight two statements in the write up.
The first is:
As I [a Big Data practitioner] mentioned, Spark does not include its own system for organizing files in a distributed way (the file system) so it requires one provided by a third-party. For this reason many Big Data projects involve installing Spark on top of Hadoop, where Spark’s advanced analytics applications can make use of data stored using the Hadoop Distributed File System (HDFS).
In short, Spark is what I call a wrapper. One uses it like a taco shell to keep the good in position for real time munching.
The second is this comment:
The open source principle is a great thing, in many ways, and one of them is how it enables seemingly similar products to exist alongside each other – vendors can sell both (or rather, provide installation and support services for both, based on what their customers actually need in order to extract maximum value from their data.
What the write omits is that there are some other bits and pieces needed; for example, how does one locate a particular string amidst the Big Data?
The point, for me, is that these nested and layered systems are truly exciting to troubleshoot. Not only are their issues with the integrity of the data, there is the thrill of getting each subsystem to work and then figuring out how to get useful outputs from the digital equivalent of a Roy’s Place Lassie’s Double Revenge sandwich before it closed its doors in 2013.
A Lassie’s Double Revenge consisted of a knockwurst, cheese, grilled onions, baked beans, and assorted seasonings served to the discerning diner.
A little like an open source Big Data mash up.
As a bonus, one gets to hire consultants who can make separate products, systems, and solutions work in a way which benefits the licensee and the system’s users.
Stephen E Arnold, July 30, 2015
July 30, 2015
Office 365 has been a bit contentious within the community. While Microsoft touts it as a solution that gives users more of the social and mobile components they were wishing for, it has not been widely adopted. IT Web gives some reasons to consider the upgrade in its article, “Why You Should Migrate SharePoint to Office 365.”
The article says:
“Although SharePoint as a technology has matured a great deal over the years, I still see many businesses struggling with issues related to on-premises SharePoint, says Simon Hepburn, director of bSOLVe . . . You may be thinking: ‘Are things really that different using SharePoint on Office 365?’ Office 365 is constantly evolving and as I will explain, this evolution brings with it opportunities that your business should seriously consider exploring.’”
Of course the irony is that with the new SharePoint 2016 upgrade, Microsoft is giving users a promise to stand behind on-premise installations, but they are continuing to integrate and promote the Office 365 components. Only time and feedback will dictate the continued direction of the enterprise solution. In the meantime, stay tuned to Stephen E. Arnold and his Web service, ArnoldIT.com. Arnold is a longtime leader in search and his dedicated SharePoint feed is a one-stop-shop for all the latest news, tips, and tricks.
Emily Rae Aldridge, July 30, 2015
July 30, 2015
For anyone using open-source Unix to work with data, IT World has a few tips for you in “The Best Tools and Techniques for Finding Data on Unix Systems.” In her regular column, “Unix as a Second Language,” writer Sandra Henry-Stocker explains:
“Sometimes looking for information on a Unix system is like looking for needles in haystacks. Even important messages can be difficult to notice when they’re buried in huge piles of text. And so many of us are dealing with ‘big data’ these days — log files that are multiple gigabytes in size and huge record collections in any form that might be mined for business intelligence. Fortunately, there are only two times when you need to dig through piles of data to get your job done — when you know what you’re looking for and when you don’t. 😉 The best tools and techniques will depend on which of these two situations you’re facing.”
When you know just what to search for, Henry-Stocker suggests the “grep” command. She supplies a few variations, complete with a poetic example. Sometimes, like when tracking errors, you’re not sure what you will find but do know where to look. In those cases, she suggests using the “sed” command. For both approaches, Henry-Stocker supplies example code and troubleshooting tips. See the article for the juicy details.
Cynthia Murrell, July 30, 2015
July 28, 2015
One of the most frequently discussed SharePoint struggles is integrating SharePoint data with existing external data. IT Business Edge has compiled a short slideshow with helpful tips regarding integration, including the possible use of business connectivity services. See all the details in their presentation, “Eight Steps to Connect Office 365/SharePoint Online with External Data.”
The summary states:
“According to Mario Spies, senior strategic consultant at AvePoint, a lot of companies are in the process of moving their SharePoint content from on-premise to Office 365 / SharePoint Online, using tools such as DocAve Migrator from SharePoint 2010 or DocAve Content Manager from SharePoint 2013. In most of these projects, the question arises about how to handle SharePoint external lists connected to data using BDC. The good news is that SharePoint Online also supports Business Connectivity Services.”
To continue to learn more about the tips and tricks of SharePoint connectivity, stay tuned to ArnoldIT.com, particularly the SharePoint feed. Stephen E. Arnold is a lifelong leader in all things search, and his expertise is especially helpful for SharePoint. Users will continue to be interested in data migration and integration, and how things may be easier with the SharePoint 2016 update coming soon.
Emily Rae Aldridge, July 28, 2015
July 27, 2015
I read an amazing write up. The title of this gem of high school counseling is “7 Skills/Attitudes to Become a Better Data Scientist.” What does one need to be a better data scientist? Better python or R programming methods? Sharper mathematical intuition? Ability to do the least upper bound (sup) and greatest lower bound (inf) of a set of real numbers) in your head, without paper, and none of that Mathematica software? Wrong.
What you need is to be intellectually curious, an understanding of business, ability to communicate (none of the Cool Hand Luke pithiness), knowledge of more than one programming language, knowledge of SQL, be a participant in competitions, and read articles like “7 Skills and Attitudes.”
Yep, follow these tips and you too can be a really capable data scientist. Why wait? Act now. Read the “7 Skills” article. Nah, don’t worry about such silly notions as data integrity or statistical procedures. Talk to someone, anyone and you will be 14.28 percent of the way to your goal.
Stephen E Arnold, July 27, 2015
July 27, 2015
Support for open data, government datasets freely available to the public, has taken off in recent years; the federal government’s launch of Data.gov in 2009 is a prominent example. Naturally, some companies have sprung up to monetize this valuable resource. The New York Times reports, “Data Mining Start-Up Enigma to Expand Commercial Business.”
The article leads with a pro bono example of Enigma’s work: a project in New Orleans that uses that city’s open data to identify households most at risk for fire, so the city can give those folks free smoke detectors. The project illustrates the potential for good lurking in sets of open data. But make no mistake, the potential for profits is big, too. Reporter Steve Lohr explains:
“This new breed of open data companies represents the next step, pushing the applications into the commercial mainstream. Already, Enigma is working on projects with a handful of large corporations for analyzing business risks and fine-tuning supply chains — business that Enigma says generates millions of dollars in revenue.
“The four-year-old company has built up gradually, gathering and preparing thousands of government data sets to be searched, sifted and deployed in software applications. But Enigma is embarking on a sizable expansion, planning to nearly double its staff to 60 people by the end of the year. The growth will be fueled by a $28.2 million round of venture funding….
“The expansion will be mainly to pursue corporate business. Drew Conway, co-founder of DataKind, an organization that puts together volunteer teams of data scientists for humanitarian purposes, called Enigma ‘a first version of the potential commercialization of public data.’”
Other companies are getting into the game, too, leveraging open data in different ways. There’s Reonomy, which supplies research to the commercial real estate market. Seattle-based Socrata makes data-driven applications for government agencies. Information discovery company Dataminr uses open data in addition to Twitter’s stream to inform its clients’ decisions. Not surprisingly, Google is a contender with its Sidewalk Labs, which plumbs open data to improve city living through technology. Lohr insists, though, that Enigma is unique in the comprehensiveness of its data services. See the article for more on this innovative company.
Cynthia Murrell, July 27, 2015
July 25, 2015
i read “Contradictions of Big Data.” Few articles which I see take a common sense approach to Big Data baloney. (Azure chip consultants bristle at my use of baloney. Too bad.) I liked this article.
The article appeared in my Overflight a day ago even though the write up was posted in March 2015. Big Data does not mean rapid data.
I highlighted this passage:
have been waging an uphill battle against the nonsensical and unsubstantiated idea that more data is better data, but now this view is getting some additional support, and from some surprising corners.
I do not agree. The yap about Big Data has almost overpowered the craziness of search engine optimization’s shouting about semantic search.
The write up points out:
Take it from me [Martyn Jones] , most businesses will not be basing their business strategies on the analysis of a glut of selfies, home videos of cute kittens, or the complete works of William Shakespeare or Dan Brown. Almost all business analysis will continue to be carried out on structured data obtained primarily from internal operational systems and external structured data providers.
The write up points out the silliness of velocity and several other slices of marketing baloney. (Make a sandwich, please.)
I found this paragraph insightful:
I have seen data scientists at work, and the word science doesn’t actually jump out and grab you. It’s difficult to make the connection, just as it is to accurately connect some popular science magazines with fundamental scientific research. If a professional and qualified statistician wants to label themselves a data scientist then I have no issue with that, it’s their problem, but I am not willing to lend credibility to the term ‘data scientist’ when it is merely an interesting job title, with at most a tenuous connection to the actual role, and one that is liberally applied, with the almost customary largesse of IT, to creative code hackers and business-averse dabblers in data.
Harsh words for those who combine an undergraduate degree minor in math with Twitter and come up with data scientist.
Hopefully other will pick up this practical approach to the sliced and processed meat wrapped in plastic and branded Big Data.
Stephen E Arnold, July 25, 2015
July 24, 2015
Humans are visual creatures and they learn and absorb information better when pictures accompany it. In recent years, the graphic novel medium has gained popularity amongst all demographics. The amount of information a picture can communicate is astounding, but unless it is looked for it can be hard to find. It also cannot be searched by a search engine…or can it? Synaptica is in the process of developing the “OASIS Deep Image Indexing Using Linked Data,”
OASIS is an acronym for Open Annotation Semantic Imaging System, an application that unlocks image content by giving users the ability to examine an image closer than before and highlighting data points. OASIS is linked data application that enables parts of the image to be identified as linked data URIS, which can then be semantically indexed to controlled vocabulary lists. It builds an interactive map of an image with its features and conceptual ideas.
“With OASIS you will be able to pan-and-zoom effortlessly through high definition images and see points of interest highlight dynamically in response to your interaction. Points of interest will be presented along with contextual links to associated images, concepts, documents and external Linked Data resources. Faceted discovery tools allow users to search and browse annotations and concepts and click through to view related images or specific features within an image. OASIS enhances the ability to communicate information with impactful visual + audio + textual complements.”
OASIS is advertised as a discovery and interactive tool that gives users the chance to fully engage with an image. It can be applied to any field or industry, which might mean the difference between success and failure. People want to fully immerse themselves in their data or images these days. Being able to do so on a much richer scale is the future.
Whitney Grace, July 24, 2015