My Feed Personalization a Step Too Far
September 8, 2017
In an effort to be even more user-friendly and to further encourage a narcissistic society, Google now allows individuals to ‘follow’ or ‘unfollow’ topics, delivered daily to devices, as they deem them interesting or uninteresting. SEJ explains the new feature which is considered an enhancement of their ‘my feed’ which is intended to personalize news.
As explained in the article,
Further advancements to Google’s personalized feed include improved machine learning algorithms, which are said to be more capable at anticipating what an individual may find interest. In addition to highlighting stories around manually and algorithmically selected topics of interest, the feed will also display stories trending in your area and around the world.
That seems like a great way to keep people current on topics ranging geographically, politically and culturally, but with the addition of ‘follow’ or ‘unfollow’, once again, individuals can reduce their world to a series of pop-star updates and YouTube hits. Isn’t it an oxymoron to both suggest topics and stories in an effort to keep an individual informed of the world around them, and yet allow them to stop the suggestions are they appear boring or lack familiarity? Now, Google, you can do better.
Catherine Lamsfuss, September 15, 2017
Google May Buy a Phone Maker: Again? Really?
September 8, 2017
I am not sure if this story “Google Is Apparently Ready to Buy Smartphone Maker HTC” is [a] an accurate business report, [b] fake news, [c] evidence of time travel, [d] a mis-timed April Fool’s joke.
You decide.
The write up recounts:
From a strategic standpoint, owning & operating its own mobile operating division would offset some of the key strategic challenges that Google’s mobile computing business might face: a) a deeper integration of hardware/software would offset some of the Android fragmentation issues that do not plague Apple iOS; b) development cycles that maximize forward mobile computing trends (Google Lens, location, ARCore, Google Assistant) with possible greater user adoption; c) an offset to rising Distribution TAC expenses; & d) an offset to any negative industry dynamics (unbundling of apps) resulting from the European Commission’s Android investigation.
We love the Google. We know it makes wise decisions. Hollywood may be planning a sequel to Groundhog Day.
Stephen E Arnold, September 8, 2017
Why Not Have One or Two Smart Software Platforms? Great Idea!
September 8, 2017
I have been writing about online for decades. In my Eagleton Lecture in the 1980s (I forget the year I received an ASIS Award and had to give a talk as part of the deal), I pointed out that online concentrates. It is not just economy of scale, a single online service operates like a magnet. Once critical magnetism is achieved, other services clump around or two that central gizmo. There are fancy economic explanations; for example, economies of scale, convenience, utility, and so on.
Now couple the concentration with one of the properties of digital information flows and we have some exciting things to think about. In that Eagleton Lecture, I used the example of the telegraph to illustrate how even inefficient forms of information movement can rework social landscapes.
The point is that concentration and flows are powerful forces. When data flows through an institution, that institution comes apart. When online power is concentrated, the nature of “facts” changes. Even the unconscious decisions of a widely used online service alter how individuals perceive “facts” and set their priorities. (Are you checking your email now, gentle reader?)
When I read “Facebook and Microsoft collaborate to simplify conversions from PyTorch to Caffe2,” I thought about that decades old Eagleton Lecture of mine. The write up describes what seems to be a ho-hum integration play. Here’s a passage I highlighted:
The collaborative work Facebook and Microsoft are announcing helps folks easily convert models built in PyTorch into Caffe2 models. By reducing the barriers to moving between these two frameworks, the two companies can actually improve the diffusion of research and help speed up the entire commercialization process.
Just a tiny step, right?
A few observations:
- Clumping is now evident between Google and Walmart
- Amazon and Microsoft are partnering to make digital gizmos play well together
- Fusion (both data and services) are the go-to idea for next generation information access and analysis services.
The questions which seem interesting this morning are:
- Why do we need to have just multiple artificial intelligence platforms? Won’t one be more efficient and “better”?
- How will users understand reality when smart software seamlessly operates across what appear to be separate functions?
- What will regulators do to control clumping which will command the lion’s share of revenue, resources, and influence?
Yep, just a minor step with this PyTorch and Caffee2 deal.
Online can be exciting and transformative too.
Stephen E Arnold, September 8, 2017
Yet Another Digital Divide
September 8, 2017
Recommind sums up what happened at a recent technology convention in the article, “Why Discovery & ECM Haven’t, Must Come Together (CIGO Summit 2017 Recap).” Author Hal Marcus first discusses that he was a staunch challenge to anyone who said they could provide a complete information governance solution. He recently spoke at CIGO Summit 2017 about how to make information governance a feasible goal for organizations.
The problem with information governance is that there is no one simple solution and projects tend to be self-contained with only one goal: data collection, data reduction, etc. When he spoke he explained that there are five main reasons for there is not one comprehensive solution. They are that it takes a while to complete the project to define its parameters, data can come from multiple streams, mass-scale indexing is challenging, analytics will only help if there are humans to interpret the data, risk, and cost all put a damper on projects.
Yet we are closer to a solution:
Corporations seem to be dedicating more resources for data reduction and remediation projects, triggered largely by high profile data security breaches.
Multinationals are increasingly scrutinizing their data sharing and retention practices, spurred by the impending May 2018 GDPR deadline.
ECA for data culling is becoming more flexible and mature, supported by the growing availability and scalability of computing resources.
Discovery analytics are being offered at lower, all-you-can-eat rates, facilitating a range of corporate use cases like investigations, due diligence, and contract analysis
Tighter, more seamless and secure integration of ECM and discovery technology is advancing and seeing adoption in corporations, to great effect.
And it always seems farther away.
Whitney Grace, September 8, 2017
How Search Moves Forward
September 8, 2017
Researchers at UT Austin are certainly into search engines, and are eager to build improved neural models. The piece “The Future of Search Engines” at Innovation Toronto examines two approaches, suggested by associate professor Matthew Lease, to create more effective information retrieval systems. The article begins by describing how search engines currently generate their results:
The outcome is the result of two powerful forces in the evolution of information retrieval: artificial intelligence — especially natural language processing — and crowdsourcing. Computer algorithms interpret the relationship between the words we type and the vast number of possible web pages based on the frequency of linguistic connections in the billions of texts on which the system has been trained. But that is not the only source of information. The semantic relationships get strengthened by professional annotators who hand-tune results — and the algorithms that generate them — for topics of importance, and by web searchers (us) who, in our clicks, tell the algorithms which connections are the best ones. Despite the incredible, world-changing success of this model, it has its flaws. Search engine results are often not as ‘smart’ as we’d like them to be, lacking a true understanding of language and human logic. Beyond that, they sometimes replicate and deepen the biases embedded in our searches, rather than bringing us new information or insight.
The first paper, Learning to Effectively Select Topics For Information Retrieval Test Collections (PDF), details a way to pluck and combine the best work of several annotators, professional and crowd-sourced alike, for each text. The Innovation Toronto article spends more time on the second paper, Exploiting Domain Knowledge via Grouped Weight Sharing with Application to Text Categorization (PDF). The approach detailed here taps into existing resources like WordNet, a lexical database for the English language, and domain ontologies like the Unified Medical Language System. See the article for the team’s suggestions on using weight sharing to blend machine learning and human knowledge.
The researchers’ work was helped by grants from the National Science Foundation, the Institute of Museum and Library Services, and the Defense Advanced Research Projects Agency, three government organizations hoping for improvements in the quality of crowdsourced information. We’re reminded that, though web-search companies do perform their own research, it is necessarily focused on commercial applications and short-term solutions. The sort of public investment we see at work here can pave the way to more transformative, long-term developments, the article concludes.
Cynthia Murrell, September 8, 2017
Short Honk: Recorded Future Write Up Highlights Dark Web Notebook
September 7, 2017
Nice surprise. Recorded Future published a short article in its blog called “Dark Web Explained: Shining a Light on Dark Web Activity.” A happy quack to the wizards at Recorded Future from the Beyond Search team.
Kenny Toth, September 7, 2017
Smart Software: An AI Future and IBM Wants to Be There for 10 Years
September 7, 2017
I read “Executives Say AI Will Change Business, but Aren’t Doing Much about It.” My takeaway: There is no there there—yet. I noted these “true factoids” waltzing through the MIT-charged write up:
- 20% of the 3,000 companies in the sample use smart software
- 5% use smart software “extensively” (No, I don’t know what extensively means either.)
- About one third of the companies in the sample “have an AI strategy in place.”
Pilgrims, that means there is money to be made in the smart software discontinuity. Consulting and coding are a match made in MBA heaven.
If my observation is accurate, IBM’s executives read the tea leaves and decided to contribute a modest $240 million for the IBM Watson Artificial Intelligence Lab at MIT. You can watch a video and read the story from Fortune Magazine at this link.
The Fortune “real” journalism outfit states:
This is the first time that a single company has underwritten an entire laboratory at the university.
However, the money will be paid out over 10 years. Lucky parents with children at MIT can look forward to undergrad, graduate, and post graduate work at the lab. No living in the basement for this cohort of wizards.
Several questions arise:
- Which institution will “own” the intellectual property of the wizards from MIT and IBM? What about the students’ contributions?
- How will US government research be allocated when there is a “new” lab which is funded by a single commercial enterprise? (Hello, MITRE, any thoughts?)
- Will young wizards who formulate a better idea be constrained? Might the presence or shadow of IBM choke off some lines of innovation until the sheepskin is handed over?
- Are Amazon, Facebook, Google, and Microsoft executives kicking themselves for not thinking up this bold marketing play and writing an even bigger check?
- Will IBM get a discount on space advertising in MIT’s subscription publications?
Worth monitoring because other big name schools might have a model to emulate? Company backed smart software labs might become the next big thing to pitch for some highly regarded, market oriented institutions. How much would Cambridge University or the stellar University of Louisville capture if they too “sold” labs to commercial enterprises? (Surprised at my inclusion of the University of Louisville? Don’t be. It’s an innovator in basketball recruiting and recruiting real estate mogul talent. Smart software is a piece of cake for this type of institution of higher learning.)
Stephen E Arnold
Old School Searcher Struggles with Organizing Information
September 7, 2017
I read a write up called “Semantic, Adaptive Search – Now that’s a Mouthful.” I cannot decide if the essay is intended to be humorous, plaintive, or factual. The main idea in the headline is that there is a type of search called “semantic” and “adaptive.” I think I know about the semantic notion. We just completed a six month analysis of syntactic and semantic technology for one of my few remaining clients. (I am semi retired as you may know, but tilting at the semantic and syntactic windmills is great fun.)
The semantic notion has inspired such experts as David Amerland, an enthusiastic proponent of the power of positive thinking and tireless self promotion, to heights of fame. The syntax idea gives experts in linguistics hope for lucrative employment opportunities. But most implementations of these hallowed “techniques” deliver massive computational overhead and outputs which require legions of expensive subject matter experts to keep on track.
The headline is one thing, but the write up is about another topic in my opinion. Here’s the passage I noted:
The basic problem with AI is no vendor is there yet.
Okay, maybe I did not correctly interpret “Semantic, Adaptive Search—Now That’s a Mouthful.” I just wasn’t expecting artificial intelligence, a very SEO type term.
But I was off base. The real subject of the write up seems to be captured in this passage:
I used to be organized, but somehow I lost that admirable trait. I blame it on information overload. Anyway, I now spend quite a bit of time searching for my blogs, white papers, and research, as I have no clue where I filed them. I have resorted to using multiple search criteria. Something I do, which is ridiculous, is repeat the same erroneous search request, because I know it’s there somewhere and the system must have misunderstood, right? So does the system learn from my mistakes, or learn the mistakes? Does anyone know?
Okay, disorganized. I would never have guessed without a title that references semantic and adaptive search, the lead paragraph about artificial intelligence, and this just cited bit of exposition which makes clear that the searcher cannot make the search systems divulge the needed information.
One factoid in the write up is that a searcher will use 2.73 terms per query. I think that number applies to desktop boat anchor searches from the Dark Ages of old school querying. Today, more than 55 percent of queries are from mobile devices. About 20 percent of those are voice based. Other queries just happen because a greater power like Google or Microsoft determines what you “really” wanted is just the ticket. To me, the shift from desktop to mobile makes the number of search terms in a query a tough number to calculate. How does one convert data automatically delivered to a Google Map when one is looking for a route with an old school query with 2.73 terms? Answer: You maybe just use whatever number pops out from a quick Bing or Google search from a laptop and go with the datum in a hit on an ad choked result list.
The confused state of search and content processing vendors is evident in their marketing, their reliance on jargon and mumbo jumbo, and fuzzy thinking about obtaining information to meet a specific information need.
I suppose there is hope. One can embrace a taxonomy and life will be good. On the other hand, disorganization does not bode well for a taxonomy created by a person who cannot locate information.
Well, one can use smart software to generate those terms, the Use Fors and the See Alsos. One can rely on massive amounts of Big Data to save the day. One can allow a busy user of SharePoint to assign terms to his or her content. Many good solutions which make information access a thrilling discipline.
Now where did I put that research for my latest book, “The Dark Web Notebook”? Ah, I know. In a folder called “DWNB Research” on my back up devices with hard copies in a banker’s box labeled “DWNB 2016-2017.”
Call me old fashioned but the semantic, syntactic, artificially intelligent razzmatazz underscores the triumph of jargon over systems and methods which deliver on point results in response to a query from a person who knows that for which he or she seeks.
Plus, I have some capable research librarians to keep me on track. Yep, real humans with MLS degrees, online research expertise, and honest-to-god reference desk experience.
Smart software and jargon requires more than disorganization and arm waving accompanied by toots from the jargon tuba.
Stephen E Arnold, September 7, 2017
A New and Improved Content Delivery System
September 7, 2017
Personalized content and delivery is the name of the game in PRWEB’s, “Flatirons Solutions Launches XML DITA Dynamic Content Delivery Solutions.” Flatirons Solutions is a leading XML-based publishing and content management company and they recently released their Dynamic Content Delivery Solution. The Dynamic Content Delivery Solution uses XML-based technology will allow enterprises to receive more personalized content. It is advertised that it will reduce publishing and support costs. The new solution is built with the Mark Logic Server.
By partnering with Mark Logic and incorporating their industry-leading XML content server, the solution conducts powerful queries, indexing, and personalization against large collections of DITA topics. For our clients, this provides immediate access to relevant information, while producing cost savings in technical support, and in content production, maintenance, review and publishing. So whether they are producing sales, marketing, technical, training or help documentation, clients can step up to a new level of content delivery while simultaneously improving their bottom line.
The Dynamic Content Delivery Solution is designed for government agencies and enterprises that publish XML content to various platforms and formats. Mark Logic is touted as a powerful tool to pool content from different sources, repurpose it, and deliver it to different channels.
MarkLogic finds success in its core use case: slicing and dicing for publishing. It is back to the basics for them.
Whitney Grace, September 7, 2017
—
Google Innovation Convoluted to Many
September 7, 2017
In a race against time, Google seems to be struggling to keep up with Apple in many categories, messaging and video chat just to name a few. A recent Phandroid article called out Google on their multiple fails over the years in its plight to dominate Apple.
The primary criticism is Google’s lack of comparable messaging system. As the article explains,
Right now, Google’s solution for handling messaging for the average user is looking a lot like the early 90s landscape for all those competing messaging services. But at least those services were competing with one another. Google’s messaging services cannibalize one another as Google meanders down its course of attempting to find an iMessage solution in the wake of its upheavals.
Although the folks at Phandroid do make good points for Google’s identity crisis, they leave out many other innovations that, although possible missteps, are moving things forward. One such development is the introduction of YouTube Messenger that might seem redundant to many, but also answers many of the problems mentioned by Phandroid.
Catherine Lamsfuss, September 7, 2017