HonkinNews for August 16, 2016
August 16, 2016
The weekly news program about search, online, and content processing is now available at https://youtu.be/mE3MGlmrUWc. In addition to comments about Goo!Hoo, IBM, and Microsoft, you will learn about grilling squirrel over a wood fire. Live from Harrod’s Creek.
Stephen E Arnold, August 16, 2016
IBM’s Champion Human Resources Department Announces “Permanent” Layoff Tactics
August 16, 2016
The article on Business Insider titled Leaked IBM Email Says Cutting “Redundant” Jobs Is a “Permanent and Ongoing” Part of Its Business Model explores the language and overall human resource strategy of IBM. Netherland IBM personnel learned in the email that layoffs are coming, but also that layoffs will be a regular aspect of how IBM “optimizes” their workforce. The article tells us,
“IBM isn’t new to layoffs, although these are the first to affect the Netherlands. IBM’s troubled business units, like its global technology services unit, are shrinking faster than its booming businesses, like its big data/analytics, machine learning (aka Watson), and digital advertising agency are growing…All told, IBM eliminated and gained jobs in about equal numbers last year, it said. It added about 70,000 jobs, CEO Rometty said, and cut about that number, too.”
IBM seems to be performing a balancing act that involves gaining personnel in areas like data analytics while shedding employees in other areas that are less successful, or “redundant.” This allows them to break even, although the employees that they fire might feel that Watson itself could have delivered the news more gracefully and with more tact than the IBM HR department did. At any rate, we assume that IBM’s senior management asked Watson what to do and that this permanent layoffs strategy was the informed answer provided by the supercomputer.
Chelsea Kerwin, August 16, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden /Dark Web meet up on August 23, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233019199/
Yippy Revealed: An Interview with Michael Cizmar, Head of Enterprise Search Division
August 16, 2016
In an exclusive interview, Yippy’s head of enterprise search reveals that Yippy launched an enterprise search technology that Google Search Appliance users are converting to now that Google is sunsetting its GSA products.
Yippy also has its sights targeting the rest of the high-growth market for cloud-based enterprise search. Not familiar with Yippy, its IBM tie up, and its implementation of the Velocity search and clustering technology? Yippy’s Michael Cizmar gives some insight into this company’s search-and-retrieval vision.
Yippy ((OTC PINK:YIPI) is a publicly-trade company providing search, content processing, and engineering services. The company’s catchphrase is, “Welcome to your data.”
The core technology is the Velocity system, developed by Carnegie Mellon computer scientists. When IBM purchased Vivisimio, Yippy had already obtained rights to the Velocity technology prior to the IBM acquisition of Vivisimo. I learned from my interview with Mr. Cizmar that IBM is one of the largest shareholders in Yippy. Other facets of the deal included some IBM Watson technology.
This year (2016) Yippy purchased one of the most recognized firms supporting the now-discontinued Google Search Appliance. Yippy has been tallying important accounts and expanding its service array.
John Cizmar, Yippy’s senior manager for enterprise search
Beyond Search interviewed Michael Cizmar, the head of Yippy’s enterprise search division. Cizmar found MC+A and built a thriving business around the Google Search Appliance. Google stepped away from on premises hardware, and Yippy seized the opportunity to bolster its expanding business.
I spoke with Cizmar on August 15, 2016. The interview revealed a number of little known facts about a company which is gaining success in the enterprise information market.
Cizmar told me that when the Google Search Appliance was discontinued, he realized that the Yippy technology could fill the void and offer more effective enterprise findability. He said, “When Yippy and I began to talk about Google’s abandoning the GSA, I realized that by teaming up with Yippy, we could fill the void left by Google, and in fact, we could surpass Google’s capabilities.”
Cizmar described the advantages of the Yippy approach to enterprise search this way:
We have an enterprise-proven search core. The Vivisimo engineers leapfrogged the technology dating from the 1990s which forms much of Autonomy IDOL, Endeca, and even Google’s search. We have the connector libraries THAT WE ACQUIRED FROM MUSE GLOBAL. We have used the security experience gained via the Google Search Appliance deployments and integration projects to give Yippy what we call “field level security.” Users see only the part of content they are authorized to view. Also, we have methodologies and processes to allow quick, hassle-free deployments in commercial enterprises to permit public access, private access, and hybrid or mixed system access situations.
With the buzz about open source, I wanted to know where Yippy fit into the world of Lucene, Solr, and the other enterprise software solutions. Cizmar said:
I think the customers are looking for vendors who can meet their needs, particularly with security and smooth deployment. In a couple of years, most search vendors will be using an approach similar to ours. Right now, however, I think we have an advantage because we can perform the work directly….Open source search systems do not have Yippy-like content intake or content ingestion frameworks. Importing text or an Oracle table is easy. Acquiring large volumes of diverse content continues to be an issue for many search and content processing systems…. Most competitors are beginning to offer cloud solutions. We have cloud options for our services. A customer picks an approach, and we have the mechanism in place to deploy in a matter of a day or two.
Connecting to different types of content is a priority at Yippy. Even through the company has a wide array of import filters and content processing components, Cizmar revealed that Yippy is “enhanced the company’s connector framework.”
I remarked that most search vendors do not have a framework, relying instead on expensive components licensed from vendors such as Oracle and Salesforce. He smiled and said, “Yes, a framework, not a widget.”
Cizmar emphasized that the Yippy IBM Google connections were important to many of the company’s customers plus we have also acquired the Muse Global connectors and the ability to build connectors on the fly. He observed:
Nobody else has Watson Explorer powering the search, and nobody else has the Google Innovation Partner of the Year deploying the search. Everybody tries to do it. We are actually doing it.
Cizmar made an interesting side observation. He suggested that Internet search needed to be better. Is indexing the entire Internet in Yippy’s future? Cizmar smiled. He told me:
Yippy has a clear blueprint for becoming a leader in cloud computing technology.
For the full text of the interview with Yippy’s head of enterprise search, Michael Cizmar, navigate to the complete Search Wizards Speak interview. Information about Yippy is available at http://yippyinc.com/.
Stephen E Arnold, August 16, 2016
Mixpanel Essay Contains Several Smart Software Gems
August 11, 2016
I read “The Hard Thing about Machine Learning.” The essay explains the history of machine learning at Mixpanel. Mixpanel is a business analytics company. Embedded in the write up are several observations which I thought warranted highlighting.
The first point is the blunt reminded that machine learning requires humans—typically humans with specialist skills—to make smart software work as expected. The humans have to figure out what problem they and the numerical recipes are supposed to solve. Mixpanel says:
machine learning isn’t some sentient robot that does this all on its own. Behind every good machine learning model is a team of engineers that took a long thoughtful look at the problem and crafted the right model that will get better at solving the problem the more it encounters it. And finding that problem and crafting the right model is what makes machine learning really hard.
The second pink circle in my copy of the essay corralled this observation:
The broader the problem, the more universal the model needs to be. But the more universal the model, the less accurate it is for each particular instance. The hard part of machine learning is thinking about a problem critically, crafting a model to solve the problem, finding how that model breaks, and then updating it to work better. A universal model can’t do that.
I think this means that machine learning works on quite narrow, generally tidy problems. Anyone who has worked with the mid 1990s Autonomy IDOL system knows that as content flows into a properly trained system, that “properly trained” system can start to throw some imprecise and off-point outputs. The fix is to retrain the system on a properly narrowed data set. Failure to do this would cause users to scratch their heads because they could not figure out how their query about computer terminals generated outputs about railroad costs. The key is the word “terminal” and increasingly diverse content flowing into the system.
The third point received a check mark from this intrepid reader:
Correlation does not imply causation.
Interesting. I think one of my college professors in 1962 made a similar statement. Pricing for Mixpanel begins at $600 per month for four million data points.
Stephen E Arnold, August 11, 2016
Semantify Secures Second Funding Round
August 4, 2016
Data-management firm Semantify has secured more funding, we learn from “KGC Capital Invests in Semantify, Leaders in Cognitive Discovery and Analytics” at Benzinga. The write-up tells us primary investor KGC Capital was joined by KDWC Venture Fund and Bridge Investments in making the investment, as well as by existing investors (including its founder, Vishy Dasari.) The funds from this Series A funding round will be used to address increased delivery, distribution, and packaging needs.
The press release describes Semantify’s platform:
“Semantify automates connecting information in real time from multiple silos, and empowers non-technical users to independently gain relevant, contextual, and actionable insights using a free form and friction-free query interface, across both structured and unstructured content. With Semantify, there would be no need to depend on data experts to code queries and blend, curate, index and prepare data or to replicate data in a new database. A new generation self-service enterprise Ad-hoc discovery and analytics platform, it combines natural language processing (NLP), machine learning and advanced semantic modeling capabilities, in a single seamless proprietary platform. This makes it a pioneer in democratization of independent, on demand information access to potentially hundreds of millions of users in the enterprise and e-commerce world.”
Semantify cites their “fundamentally unique” approach to developing data-management technology as the force behind their rapid deployment cycles, low maintenance needs, and lowered costs. Formerly based in Delaware, the company is moving their headquarters to Chicago (where their investors are based). Semantify was founded in 2008. The company is also hiring; their About page declares, toward the bottom: “Growing fast. We need people;” as of this writing, they are seeking database/ BI experts, QA specialists, data scientists & knowledge modelers, business analysts, program & project managers, and team leads.
Cynthia Murrell, August 4, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Honkin News: Beyond Search Video News Program Available Now
August 2, 2016
Honkin’ News is now online via YouTube at https://youtu.be/hf93zTSixgo. The weekly program tries to separate the giblets from the goose feathers in online search and content processing. Each program draws upon articles and opinion appearing in the Beyond Search blog.
The Beyond Search program is presented by Stephen E Arnold, who resides in rural Kentucky. The five minute programs highlights stories appearing in the daily Beyond Search blog and includes observations not appearing in the printed version of the stories. No registration is required to view the free video.
Arnold told Beyond Search:
Online search and content processing generate modest excitement. Honkin’ News comments on some of the more interesting and unusual aspects of information retrieval, natural language processing, and the activities of those working to make software understand digital content. The inaugural program highlights Verizon’s Yahoo AOL integration strategy, explores why search fails, and how manufacturing binders and fishing lures might boost an open source information access strategy.
The video is created using high tech found in the hollows of rural Kentucky; for example, eight mm black-and-white film and two coal-fired computing devices. One surprising aspect of the video is the window showing the vista outside the window of the Beyond Search facility. The pond filled with mine drainage is not visible, however.
Kenny Toth, August 2, 2016
Is IBM Vulnerable to OpenText?
July 21, 2016
I read “Hey, IBM, OpenText Is Coming for You.” The write up reports that the poobah of OpenText said that its new Magellan system is “a next generation analytics platform.” Getting from Yet another OpenText system (YOTS) to the nemesis of IBM is quite a leap.
But here’s the statement, once again from the OpenText poobah, that caught my attention:
But even more interesting than the product itself, is the bullish way in which OpenText is calling out IBM Watson. “We are going to position it directly against Watson. We’re not going to shy away from that at all,” Mark said. “We think there’s a whole class of problems that enterprises want to solve themselves and what they need is an affordable platform, one that’s open and programmable to them and accessible to them and that’s going to be Magellan. So we’re going to position ourselves and stay focused directly against Watson.”
The write up explains that OpenText Magellan is better, faster, and cheaper. I have heard that before I think. But the details are interesting.
Magellan’s software is open., Its hardware is open. Its IP is owned by the licensee. Its deployment options are “run anywhere.” It is extensible by the licensee. Its ecosystem is open. Its cost is a mere one dollar sign.
And what do you think about IBM Watson? Well, its software is closed. Its hardware is closed. Its IP ownership is not the licensee’s. Watson is extensible only by IBM Global Services. IBM’s ecosystem is closed. Best of the points, IBM’s cost is six dollar signs.
OpenText is a $2 billion a year outfit. The hapless IBM is, despite its being lost in revenue space, is about $90 billion a year.
My view is that OpenText is swinging for the marketing and conceptual fences. IBM is trying to find the secret notebook that unlocks revenues.
I would point out that Fancy Dan software designed to help executives make better decisions is plentiful. Many vendors covet this niche. There is excitement ahead. Both OpenText and IBM may find that talk about smart software flows more rapidly than sustainable revenue and healthy profits. Keep in mind the high cost of technological debt. That’s one dot point which IBM and OpenText share a common point of weakness.
Stephen E Arnold, July 21, 2106
The Watson Update
July 15, 2016
IBM invested a lot of resources, time, and finances into developing the powerful artificial intelligence computer Watson. The company has been trying for years to justify the expense as well as make money off their invention, mostly by having Watson try every conceivable industry that could benefit from big data-from cooking to medicine. We finally have an update on Watson says ZDNet in the article, “IBM Talks About Progress On Watson, OpenPower.”
Watson is a cognitive computer system that learns, supports natural user interfaces, values user expertise, and evolves with new information. Evolving is the most important step, because that will allow Watson to keep gaining experience and learn. When Watson was first developed, IBM fed it general domain knowledge, then made the Watson Discovery to find answers to specific questions. This has been used in the medical field to digest all the information created and applying it to practice.
IBM also did this:
“Most recently IBM has been focused on making Watson available as a set of services for customers that want to build their own applications with natural question-and-answer capabilities. Today it has 32 services available on the Watson Developer Cloud hosted on its Bluemix platform-as-a-service… Now IBM is working on making Watson more human. This includes a Tone Analyzer (think of this as a sort spellchecker for tone before you send that e-mail to the boss), Emotion Analysis of text, and Personality Insights, which uses things you’ve written to assess your personality traits.”
Cognitive computing has come very far since Watson won Jeopardy. Pretty soon the technology will be more integrated into our lives. The bigger question is how will change society and how we live?
Whitney Grace, July 15, 2016
There is a Louisville, Kentucky Hidden Web/Dark
Web meet up on July 26, 2016. Information is at this link: http://bit.ly/29tVKpx.
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Books about Data Mining: Some Free, Some for Fee
July 14, 2016
If you want to load up on fun beach reading, I have a suggestion for you, gentle reader. KDNuggets posted “60+ Free Books on Big Data, Data Science, Data Mining, Machine Learning, Python, R, and More.” The does contain books about data mining and a number of other subjects. You will have to read the list and figure out which titles are germane to your interests. A number of the books include a helpful Amazon link. If you click on the hyperlink you may get a registration form, a PDF of the book, or this message:
Stephen E Arnold, July 14, 2016
What Could Possibly Go Wrong?
July 13, 2016
After reading The Atlantic’s article, “Technology, The Faux Equalizer” about how technology is limited to the very wealthy and does not level the playing field. It some ways new technology can be a nuisance to the average person trying to scratch out a living in an unfriendly economy. Self-driving cars are one fear, but did you ever think bankers and financial advisors would have to compete with algorithms? The International Business Times shares, “Will Financial Analysts Lose Their Jobs To Intelligent Trading Machines?”
Machine learning software can crunch numbers faster and can extrapolate more patterns than a human. Hedge fund companies have hired data scientists, physicists, and astronomers to remove noise from data and help program the artificial intelligence software. The article used UK-based Bridgewater Associates as an example of a financial institute making strides in automizing banking:
“Using Bridgewater as an example, Sutton told IBTimes UK: ‘If you look at their historic trading strategies, it’s been very much long-term bets around what’s happening at a macro level. They have built their entire business on having some of the best research and analytics in the industry and some of the smartest minds thinking on that. When you combine those two things, I would definitely expect artificial intelligence to be applied to identify large-scale trades that might not be evident to an individual researcher.’”
Developing artificial intelligence for the financial sector has already drawn the attention of private companies and could lead to a 30% lose of jobs due to digitization. It would allow financial companies a greater range of information to advise their clients on wise financial choices, but it could also mean these institutes lose talent as the analysts role was to groom more talent.
These will probably be more potential clients for IBM’s Watson. We should all just give up now and hail our robot overlords.
Whitney Grace, July 13, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph