Microsoft Acquires Revolution Analytics
February 5, 2015
The article titled Revolution Analytics Joins Microsoft on the Revolution blog makes a case for an open-source company partnering up with Microsoft. Revolution Analytics is the software provider for R, the leading programming language for statistical computing and predictive analytics. Between Microsoft supporting Hadoop and working with Linux as well as making REEF open-source and .NET Core, they are no strangers to open-source. The article goes on with more examples,
“Microsoft has been an active participant in many other open source projects, too. There are over 1,600 OSS projects from Microsoft on CodePlex and GitHub. Microsoft engineers have actively contributed to the Linux kernel for years, and the company has contributed to open source community projects including Chef, Puppet, Docker, MongoDB, Redis and OpenJDK. Microsoft blogs regularly provide information and resources for open-source tools, including Chef, Puppet and Docker.”
Before the acquisition, Microsoft was already working with Revolution Analytics, for example in the creation of Xbox online gaming service’s match-making capabilities. The article promises the Revolution Analytic users that there will be no interruption or changes in services. It also assumes that with the acquisition the number of users will be increased and Revolution Analytics will be able to invest more time and energy into ongoing work such as the R Project and Revolution R products.
Chelsea Kerwin, February 05, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Delve Assists Organizations with Content Curation
February 5, 2015
In a move a la Pinterest, Microsoft now has a feature in Delve to allow users to organize their cards via a new feature called “boards.” The latest eWeek article has all the details. Read more in the article, “Microsoft Office Delve Boards Help Enterprises With Content Curation.”
The article begins:
“Delve is a mobile-optimized app that automatically surfaces situationally relevant information and interactions on ‘cards,’ visual and sharable representations of documents, discussions and other content shared over the Office 365 platform. It is powered by Office Graph, content discovery and machine learning software that the company described as the ‘new Office 365 intelligence fabric’ when it first announced the technology last year.”
Delve is a way for users to view relevant but potentially buried information without having to work too hard for it. It is part of Microsoft’s latest attempts to be a little more intuitive and user friendly while still maintaining their role as the enterprise giant. Stephen E. Arnold covers the latest on Microsoft and many things search on his Web service, ArnoldIT.com. His SharePoint feed is quite useful for those who are following the latest trends in enterprise search.
Emily Rae Aldridge, February 05, 2015
San Francisco, Patents, and Theft: Search Innovation?”
February 5, 2015
Short honk: I read “A Bizarre Statistical Fact about Patents in San Francisco.” Some statistics professors might wrinkle their brows at the apparent correlation. Nevertheless, here’s the quote I noted:
What I’m suggesting is that this giant spike in patent rates is reflecting the combination of innovation and theft. Consider that many patents are used by the wealthier classes as a way to bilk people out of money. There’s the obvious case where patent trolls buy up overbroad patents— often in software — and threaten people with lawsuits until they pay to license a dubious patent from the troll. But patents also allow big companies to block small businesses from innovating, by charging astronomical prices to license really basic ideas or software functions. Especially in Silicon Valley, patents are often a game played by wealthy businesses, to the detriment of small-time entrepreneurs and teams of inventors.
I wonder if there is an impact on search innovation?
Stephen E Arnold, February 5, 2015
Twitter Loves Google Again and for Now
February 5, 2015
I have been tracking Twitter search for a while. There are good solutions, but these require some heavy lifting. The public services are hit and miss. Have you poked into the innards of TweetTunnel?
I read “Twitter Strikes Search Deal with Google to Surface Tweets.” Note that this link may require you to pay for access or the link has gone dead. According to the news story:
The deal means the 140-character messages written by Twitter’s 284 million users could be featured faster and more prominently by the search engine. The hope is that greater placement in Google’s search results could drive more traffic to Twitter, which could one day sell advertising to these visitors when they come to the site, or more important, entice them to sign up for the service.
Twitter wants to monetize its content. Google wants to sell ads.
The only hitch in the git along is that individual tweets are often less useful than processing of tweets by a person, a tag, or some other index point. A query for a tweet can be darned misleading. Consider running a query for a tweet on the Twitter search engine. Enter the term “thunderstone”. What do you get? Games. What about the search vendor Thunderstone. Impossible to find, right?
For full utility from Twitter, one may want to license the Twitter stream from an authorized vendor. Then pump the content into a next generation information access system. Useful outputs result for many concepts.
For more about NGIA systems and processing large flows of real time information, see CyberOSINT: Next Generation Information Access. Reading an individual tweet is often less informative than examining subsets of tweets.
Stephen E Arnold, February 5, 2015
Enterprise Search: NGIA Vendors Offer Alternative to the Search Box
February 4, 2015
I have been following the “blast from the past” articles that appear on certain content management oriented blogs and news services. I find the articles about federated search, governance, and knowledge related topics oddly out of step with the more forward looking developments in information access.
I am puzzled because the keyword search sector has been stuck in a rut for many years. The innovations touted in the consulting-jargon of some failed webmasters, terminated in house specialists, and frustrated academics are old, hoary with age, and deeply problematic.
There are some facts that cheerleaders for the solutions of the 1970s, 1980s, and 1990s choose to overlook:
- Enterprise search typically means a subset of content required by an employee to perform work in today’s fluid and mobile work environment. The mix of employees and part timers translates to serious access control work. Enterprise search vendors “support” an organization’s security systems in the manner of a consulting physician to heart surgery. Inputs but no responsibility are the characteristics.
- The costs of configuring, testing, and optimizing an old school system are usually higher than the vendor suggests. When the actual costs collide with the budget costs, the customer gets frisky. Fast Search & Transfer’s infamous revenue challenges came about in part because customers refused to pay when the system was not running and working as the marketers suggested it would.
- Employees cannot locate needed information and don’t like the interfaces. The information is often “in” the system but not in the indexes. And if in the indexes, the users cannot figure out which combination of keywords unlocks what’s needed. The response is, “Who has time for this?” When a satisfaction measure is required somewhere between 55 and 75 percent of the search system’s users don’t like it very much.
Obviously organizations are looking for alternatives. These range from using open source solutions which are good enough. Other organizations put up with Windows’ search tools, which are also good enough. More important software systems like an enterprise resource planning or accounting system come with basis search functions. Again: These are good enough.
The focus of information access has shifted from indexing a limited corpus of content using a traditional solution to a more comprehensive, automated approach. No software is without its weaknesses. But compared to keyword search, there are vendors pointing customers toward a different approach.
Who are these vendors? In this short write up, I want to highlight the type of information about next generation information access vendors in my new monograph, CyberOSINT: Next Generation Information Access.
I want to highlight one vendor profiled in the monograph and mention three other vendors in the NGIA space which are not included in the first edition of the report but for whom I have reports available for a fee.
I want to direct your attention to Knowlesys, an NGIA vendor operating in Hong Kong and the Nanshan District, Shenzhen. On the surface, the company processes Web content. The firm also provides a free download of a scraping software, which is beginning to show its age.
Dig a bit deeper, and Knowlesys provides a range of custom services. These include deploying, maintaining, and operating next generation information access systems for clients. The company’s system can process and make available automatically content from internal, external, and third party providers. Access is available via standard desktop computers and mobile devices:
Source: Knowlesys, 2014.
The system handles both structured and unstructured content in English and a number of other languages.
The company does not reveal its clients and the firm routinely ignores communications sent via the online “contact us” mail form and faxed letters.
How sophisticated in the Knowlesys system? Compared to the other 20 systems analyzed for the CyberOSINT monograph, my assessment is that the company’s technology is on a part with that of other vendors offering NGIA systems. The plus of the Knowlesys system, if one can obtain a license, is that it will handle Chinese and other ideographic languages as well as the Romance languages. The downside is that for some applications, the company’s location in China may be a consideration.
IBM and Layoffs: Watson, Watson, Where Are You?
February 4, 2015
For months I have been commenting about the increasingly weird marketing pitches for IBM Watson. This is the Lucene and home grown script system positioned as the next big thing in information retrieval. The financial goals for this system were crazy. My recollection is that IBM wanted to generate a billion in revenue from open source search and bits and pieces of the IBM technology lumber.
Impossible. Having a system ingest bounded content and then answer “questions” about that content is neither new, remarkable, or particularly interesting to me. When the system is presented as a way to solve the problem of cancer and generate barbeque sauce with tamarind, the silliness points to desperation.
IBM marketers were trying everything to make open source search into a billion dollar baby and pull of the stunt quickly. Keep in mind that Autonomy required 15 years and a number of pretty savvy acquisitions to nose into the $700 million range.
IBM, in its confused state, believed that it could do the trick in a fraction of the time. IBM apparently was unaware of the erratic thinking at Hewlett Packard that spent $11 billion for Autonomy and wanted to generate billions from that system at the same time IBM was going to collect a billion or more from the same market.
Both of these companies, dazed by a long term struggle with spreadsheet fever, were ignoring or simply did not understand the doldrums of the enterprise information access market. Big companies were quite happy to give open source solutions a try. Vendors of proprietary systems were pitching their keyword systems as everything from customer support “solutions” to business intelligence systems that would “predict” what the company should know.
Yep, right.
I read with some sadness the posts at Alliance@IBM. The viewpoint is not that of IBM management which is now firing or resource allocating its way people. I am not sure how many folks are going to be terminated, but the comments in this series of IBM employee comments suggest that the staff are unhappy. Some may not go gentle into that good night.
The point is that the underlying problems at IBM were evident in the silly Watson marketing. An organization that can with a straight face suggest that a next generation information access system can discover a new recipe provides a glimpse into an organization’s disconnect at a fundamental level.
Too bad. The stock buybacks, the sale of manufacturing assets, and the assertions that a mainframe is a mobile platform tells me that IBM stockholders may want to reevaluate those holdings.
If IBM asked Watson, I question the outputs.
Stephen E Arnold, February 4, 2015
Deep Learning: Concepts Are Difficult
February 4, 2015
Quote to note: I read “Google Brain’s Co-Inventor Tells Why He’s Building Chinese Neural Networks.”
Here’s the quote I noted:
Deep learning algorithms are very good at one thing today: learning input and mapping it to an output. X to Y. Learning concepts is going to be hard.
I support research projects. I don’t support marketers who ignore fact for hype. Also, note, that the co-inventor Andrew Ng jumped from Google to Baidu. The human brain can figure out compensation and opportunity I suggest.
Stephen E Arnold, February 3, 2015
Google Search Parameters Revealed
February 4, 2015
Short Honk: One of my two or three readers alerted me to a useful summary of Google’s search parameters. The list is available from BackLinkSentry. Google’s advanced search page is helpful but not particularly fine grained. This list of switches will unlock content that is in the Google index but not findable with a two or three word query slapped in the Google search box. Worth having at one’s fingertips in my opinion.
Stephen E Arnold, February 3, 2015
Twitter Relies on Bing’s Translation Engine to Offer Tweet Translations
February 4, 2015
The article on Search Engine Journal titled Twitter Teams Up With Bing To Offer Translated Tweets expands on the announcement by Twitter that they will be bringing back the translation of tweets. The project was abandoned in 2013, but has returned with the assistance of Bing’s translation engine. While the service is not without flaws, the article suggests that it beats no translation ability at all.
“The company admits that the service is far from perfect and still needs to be worked on: “… the results still vary and often fall below the accuracy and fluency of translations provided by a professional translator.”
While the service no doubt leaves something to be desired, it’s still an improvement over the zero built-in options they had before. Bing’s translation engine works with more than 40 language pairs, and is currently available on Twitter.com, Twitter for iOS and Android, and TweetDeck.”
If you are interested in setting up the translator you need to change your account settings to “Show Tweet translations.” Once this has been established, clicking on the globe icon will show a translation of the original text. Since the company already allows for the fact that this is not a professional translator, we can only wonder how any translation service will handle the fluidity of abbreviations and slang on Twitter.
Chelsea Kerwin, February 04, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Current.ly App Enables Improved Search on Twitter
February 4, 2015
The review on KillerStartups titled Finally! An Effective Way to Filter Twitter! discusses Current.ly and their algorithm for sorting through the noise on Twitter. Unlike Facebook, the article mentions, Twitter has avoided the use of filters, opting for the chaos of every tweet for itself. Beyond following specific conversations or searching via hashtag, there are not very effective methods for organizing and finding relevant tweets. Current.ly offers a solution:
“Current.ly not only presents the most timely topics front and center on both their mobile-optimized site and app but also lets you search for topics that interest you, again presenting the most relevant tweets before the general jibber-jabber. It’s a great solution for anyone who wants to keep up on the conversations around current events but for whom even the thought of opening Twitter’s main feed makes them sigh with frustration.”
This would improve the hashtag search function, which is still going to present a mess of tweets. Current.ly’s search algorithm promises to bring the more relevant tweets to the forefront. Additionally sweet for many Twitter-users, Current.ly is not an app unique to use in the United States. It allows the user to pick between the US, the UK, the Netherlands, Spain and Mexico. The article surmises that this list will grow as the app becomes more popular.
Chelsea Kerwin, February 04, 2014
Sponsored by ArnoldIT.com, developer of Augmentext