Internet Archive: The Bono Books
October 16, 2017
I read “Books from 1923 to 1941 Now Liberated!” The collection is based on books which libraries can scan. The write up explains the provision of the US copyright law which makes these books eligible for inclusion in the Internet Archive. Hopefully libraries will find the resources to contribute books. I did some spot checks. One gap is history books. There are others. This is an excellent effort. The interface to the Bono books retains the Internet Archive’s unique approach to interfaces; for example, clicking on a book displays the scanned pages. Clicking on a page turns the page. The outside edge of the scanned image allows one to “jump” to a particular page. Getting back to a book’s table of contents takes a bit of effort, however. Those looking for anthologies can find a collection of 20th century poetry by hunting. The search system is just good enough. Worth checking out. Libraries, scan those history books. Who doesn’t love Theodor Mommsen’s early work?
Stephen E Arnold, October 16, 2017
Understanding Intention: Fluffy and Frothy with a Few Factoids Folded In
October 16, 2017
Introduction
One of my colleagues forwarded me a document called “Understanding Intention: Using Content, Context, and the Crowd to Build Better Search Applications.” To get a copy of the collateral, one has to register at this link. My colleague wanted to know what I thought about this “book” by Lucidworks. That’s what Lucidworks calls the 25 page marketing brochure. I read the PDF file and was surprised at what I perceived as fluff, not facts or a cohesive argument.
The topic was of interest to my colleague because we completed a five month review and analysis of “intent” technology. In addition to two white papers about using smart software to figure out and tag (index) content, we had to immerse ourselves in computational linguistics, multi-language content processing technology, and semantic methods for “making sense” of text.
The Lucidworks’ document purported to explain intent in terms of content, context, and the crowd. The company explains:
With the challenges of scaling and storage ticked off the to-do list, what’s next for search in the enterprise? This ebook looks at the holy trinity of content, context, and crowd and how these three ingredients can drive a personalized, highly-relevant search experience for every user.
The presentation of “intent” was quite different from what I expected. The details of figuring out what content “means” were sparse. The focus was not on methodology but on selling integration services. I found this interesting because I have Lucidworks in my list of open source search vendors. These are companies which repackage open source technology, create some proprietary software, and assist organizations with engineering and integrating services.
The book was an explanation anchored in buzzwords, not the type of detail we expected. After reading the text, I was not sure how Lucidworks would go about figuring out what an utterance might mean. The intent-centric systems we reviewed over the course of five months followed several different paths.
Some companies relied upon statistical procedures. Others used dictionaries and pattern matching. A few combined multiple approaches in a content pipeline. Our client, a firm based in Madrid, focused on computational linguistics plus a series of procedures which combined proprietary methods with “modules” to perform specific functions. The idea for this approach was to reduce the errors in intent identification from accuracy between 65 percent to 80 percent to accuracy approaching and often exceeding 90 percent. For text processing in multi-language corpuses, the Spanish company’s approach was a breakthrough.
I was disappointed but not surprised that Lucidworks’ approach was breezy. One of my colleagues used the word “frothy” to describe the information in the “Understanding Intention” document.
As I read the document, which struck me as a shotgun marriage of generalizations and examples of use cases in which “intent” was important, I made some notes.
Let me highlight five of the observations I made. I urge you to read the original Lucidworks’ document so you can judge the Lucidworks’ arguments for yourself.
Imitation without Attribution
My first reaction was that Lucidworks had borrowed conceptually from ideas articulated by Dr. Gregory Grefenstette and his book Search Based Applications: At the Confluence of Search and Database Technologies. You can purchase this 2011 book on Amazon at this link. Lucidworks’ approach, unlike Dr. Grefenstette’s borrowed some of the analysis but did not include the detail which supports the increasing importance of using search as a utility within larger information access solutions. Without detail, the Lucidworks’ document struck me as a description of the type of solutions that a company like Tibco is now offering its customers.
Big Data and Big Money Are on a Collision Course
October 16, 2017
A recent Forbes article has started us thinking about the similarities between long-haul truckers and Wall Street traders. Really! The editorial penned by JP Morgan, “Informing Investment Decisions Using Machine Learning and Artificial Intelligence,” showcases the many ways in which investing is about to be overrun with big data machines. Depending on your stance, it is either thrilling or frightening.
The story claims:
Big data and machine learning have the potential to profoundly change the investment landscape. As the quantity and the access to data available have grown, many investors continue to evaluate how they can leverage data analysis to make more informed investment decisions. Investment managers who are willing to learn and to adopt new technologies will likely have an edge.
Sounds an awful lot like the news we have been reading recently about how almost two million truck drivers could be out of work in the next decade thanks to self-driving cars. If you have money in trucking, the amount saved is amazing, but if that’s how you make your living things have suddenly become chilly. Sounds like the future of Wall Street, according to this story.
It continues:
Big data and machine learning strategies are already eroding some of the advantage of fundamental analysts, equity long-short managers and macro investors, and systematic strategies will increasingly adopt machine learning tools and methods.
If you ask us, it’s not a matter of if but when. Nobody wants to lose their job due to efficiency, but it’s pretty much impossible to stop. Money talks and saving money talks loudest to companies and business owners, like investment firms.
Patrick Roland, October 16, 2017
The Cloud Needs EDiscovery Like Now
October 16, 2017
Cloud computing has changed the way home and enterprise systems store and access data. One of the problems with cloud computing, however, is the lack of a powerful eDiscovery tool. There are search tools for the cloud, but eDiscovery tools help users make rhyme and reason of their content. Compare The Cloud reports that there is a new eDiscovery tool to improve the cloud, “KroLDiscovery Brings End-To-End eDiscovery To The Cloud With Nebula.” Nebula is the name of KrolLDiscovery’s eDiscovery tool and it is an upgrade of eDirect365, building on the software’s processing and review capabilities.
Nebula was designed with a user-friendly eDiscovery approach that simplifies otherwise complex tasks. Nebula is also a web-based application and it can be accessed from most browsers and mobile devices. The benefit for Windows users is that it can be deployed within Windows Azure to bring scalability and rapid deployment capabilities.
KrolLDiscovery is proud of their newest product:
‘We are excited for the future of Nebula,; said Chris Weiler, President and CEO of KrolLDiscovery. ‘Expanding our eDiscovery capabilities to the cloud is a benefit to our multi-national and international clients as they can now process, store and access their data across the globe. All the while, we are dedicated to providing the same industry-leading service we are known for by our clients.’
Nebula was designed to improve how users interact and use their content on a cloud-based system. Cloud computing has a real-time and portable air about it, but its weaknesses lie in lag and security. Perhaps Nebula will enhance the former making its other weaknesses a mere shadow of the past.
Whitney Grace, October 16, 2017
Are Facebook and Google News Organizations, Not Just News Linkers?
October 13, 2017
I read “UK Leapfrogs U.S. on Regulating Bad Content.” The point that the UK may regulate Facebook and Google as news organizations is important. Facebook and Google have been drivers of the “news” for years. The companies have been viewed as magic Silicon Valley disrupters. News organizations are neither magic nor particularly disrupting. But the most interesting information in the article in my opinion is this statement:
The speed of the UK’s actions means that the United States is falling behind, even though it was the U.S. election nearly a year ago that drew the most attention to the issue. While May and her Cabinet rush to stop fake news and terrorism from spreading — threatening steep fines — in the United States, Congress has yet to hold its first hearing with the companies. Silicon Valley has also responded more slowly stateside than in Europe.
Interesting notion that “falling behind.”
Stephen E Arnold, October 13, 2017
Skepticism for Google Micro-Moment Marketing Push
October 13, 2017
An article at Street Fight, “The Fallacy of Google’s ‘Micro-Moment’ Positioning,” calls out Google’s “micro-moments” for the gimmick that it is. Here’s the company’s definition of the term they just made up: “an intent-rich moment when a person turns to a device to act on a need—to know, go, do, or buy.” In other words, any time a potential customer has a need and picks up their smartphone looking for a solution. For Street Fight’s David Mihm and Mike Blumenthal, this emphasis seems like a distraction from the failure of Google’s analytics to provide a well-rounded view of the online consumer. In fact, such oversimplification could hurt businesses that buy into the hype. In their dialogue format, they write:
David:[The term “micro-moments”] reduces all consumer buying decisions to thoughtless reflexes, which is just not reality, and drives all creative to a conversion-focused experience, which is only appropriate for specific kinds of keywords or mobile scenarios. It’s totally IN-appropriate for display or top-of-funnel advertising. I also think it’s intended to create a bizarre sense of panic among marketers — “OMG, we have to be present at every possible instant someone might be looking at their phone!” — which doesn’t help them think strategically or make the best use of their marketing or ad spend.
Mike: I agree. If you don’t have a sound, broad strategy no micro management of micro moments will help. To some extent I wonder if Google’s use of the term reflects the limits of their analytics to yet be able to provide a more complete picture to the business?
David: Sure, Google is at least as well-positioned as Amazon or Facebook to provide closed-loop tracking of purchase behavior. But I think it reflects a longstanding cultural worldview within the company that reduces human behavior to an algorithm. “Get Notification. Buy Thing.” or “See Ad. Buy Thing.” That may work for the “head” of transactional behavior but the long tail is far messier and harder to predict. Much as Larry Page would like us to be, humans are never going to be robots.
Companies that recognize the difference between consumers and robots have a clear edge in this area, no matter how Google tries to frame the issue. The authors compare Google’s blind spot to Amazon’s ease-of-use emphasis, noting the latter seems to better understand where customers are coming from. They also ponder the recent alliance between Google and Walmart to provide “voice-activated shopping” with a bit of skepticism. See the article for more of their reasoning.
Cynthia Murrell, October 13, 2017
Everyone and Their Dog Is a Search Expert
October 13, 2017
Young people get frustrated when they help older people with technology. There are considerable sighs, rolling eyes, and the situation often ends in yelling. One frustration young people are forced to deal with is teaching an older person how to use a search engine. Trying to explain how to enter information into the text box, the meaning of keywords, and how to tell the difference between results is not easy. However, search engines like Google, Bing, and Yandex try to make the search process as easy as possible so everyone can become a search expert.
Learning how to search is not the only thing people have trouble learning. Tech Viral wrote about the top “how to” searches in the article, “Here Are The Top 100 ‘How To’ Searches That People Want To Know.” Xaquin GV researched how people use Google as the answer all “how to” tool and discovered the most popular searches. Among the top “how to “searches are how to make money, how to tie a tie, how to draw, how to kiss, how to lose weight, how to make pancakes, and how to get pregnant.
The essay also examines the top 100 ‘How to’ searches conducted worldwide, and the results are very illustrative. Xaquin divided those searches into categories, with visual representations of how popular each of them is.
The search results mostly revolve around activities that are adult responsibilities along with a few surprises that concern current trends. Everyone can become an expert at any activity with a few simple keystrokes and tutorial guides. YouTube makes “how to” guides more helpful and even more dangerous when people try to copy the experts at parkour, skateboarding, and daredevil activities that should never be tried at home kids.
Whitney Grace, October 13, 2017
Google Home: A Content Vacuum?
October 12, 2017
i read “Google Is Nerfing All Home Minis Because Mine Spied on Everything I Said.” The write up is interesting because it documents a Google product which has a flaw; that is, the Google Home device in question acts like a content vacuum cleaner. The device allegedly copies what it hears without the user’s permission. Google continues to assume me that it wants to do “better”. I think that doing better is a great idea, particularly when a smart assistant functions as a listening and recording device in a way that surprises a user. The original post cited above contains some nice words for Google, screenshots, and a gentle presentation of the alleged spy function. The European Union may find this device an interesting one to evaluate for privacy regulation compliance. I think “nerf” as a verb means “kill” or more colloquially “brick”; that is, the digital equivalent of shooting a horse. Alexa, what does nerfing mean? I think it means that Google is killing this “great idea”.
Stephen E Arnold, October 12, 2017
Hewlett Packard and Code Reviews: Micro Focus Policy Shift
October 12, 2017
I noted that Hewlett Packard Enterprise allowed Russia to perform a code review. The software under “review” performs some security related functions. HPE is no longer in the software business after its sale of Autonomy to Micro Focus earlier this year and the somewhat interesting hiving of the HPE Micro Focus stake to the creatively named Seattle SpinCo in August 2017.
Micro Focus, according to Reuters, announced on October 9, 2017, that it would no longer permit code reviews by what Reuters called “high risk” governments. Prompt action for a giant roll up of different companies and their technologies. Somebody at Micro Focus mashed the pedal to metal for this policy change. Maybe Micro Focus’ UK customers were less than enthusiastic about the code review than US officials?
I am not sure what to make of HPE’s action, but on the surface, it seems that Micro Focus appears to be scrambling to contain the issue.
I did a quick look at Micro Focus and turned up a number of pointers to a company called Entit Software. This is a company with which I am not familiar. Entit has a number of offices, including one which looks pretty close to Hewlett Packard in Silicon Valley.
What’s amusing about this story is that HPE seems to be executing a complex combination of the paso double combined with a down home square dance. CNBC reported that “a White House cyber official called Russian review of Pentagon software problematic.” That seems like a criticism of HPE from my vantage point in Harrod’s Creek.
Interesting executive decision making plus footprints from corporate intermediaries. Perhaps Autonomy was not the challenge for Hewlett Packard. HP may be its own storm system? Seattle SpinCo? Really? MBAs and lawyers should be more creative in my opinion.
Stephen E Arnold, October 12, 2017
Brief Configuration Error by Google Triggers Japanese Investigation
October 12, 2017
When a tech giant makes even a small mistake, consequences can be significant. A brief write-up from the BBC, “Google Error Disrupts Corporate Japan’s Web Traffic,” highlights this lamentable fact. We learn:
Google has admitted that wide-spread connectivity issues in Japan were the result of a mistake by the tech giant. Web traffic intended for Japanese internet service providers was being sent to Google instead.
Online banking, railway payment systems as well as gaming sites were among those affected.
A spokesman said a ‘network configuration error’ only lasted for eight minutes on Friday but it took hours for some services to resume. Nintendo was among the companies who reported poor connectivity, according to the Japan Times, as well as the East Japan Railway Company.
All of that content—financial transactions included—was gone for good, since Google cannot transmit to third-party networks, according to an industry expert cited in the post. Essentially, it seems that for those few minutes, Google accidentally hijacked all traffic to NTT Communications Corp, which boasts over 50 million customers in Japan. The country’s Ministry of Internal Affairs and Communications is investigating the incident.
Cynthia Murrell, October 12, 2017