Search and Math: Stuck in a Rut?

June 23, 2015

I read a paper several years ago. Okay, maybe it was in 2008. You can find the write up here. There is a more recent review of the information in that “Top 10 Algorithms in Data Mining” article here. To make a long story short, most of the search and content processing systems use these tried-and-true methods. Most of them can be implemented using the guidelines in computer science textbooks, and there are plenty of examples to ensure that none of the search and content processing systems fall prey to the Big O issue.

Against this background, I read with interest “The Top 10 Mathematical Achievements of the Last 5ish Years.” I like the specificity of “5ish.” Good math thinking in today’s fuzzy algorithm environment.

The idea is that the write reviews math which sticks up like mountain tops above the cloud layer in the Peruvian Andes. Three of the 10 items snagged my interest, which is skewed by my bias toward search and content processing. Here are the three I highlighted from the 10 in the useful write up:

  • The bound gaps between primes. Perhaps the approach will benefit those engaged in making and making cryptography in the next year or so?
  • Voevodsky’s Homotopy Type Theory., How can one go wrong with new thoughts on fundamental math.
  • Work on The Fundamental Lemma. Gimme some old time group/set religion with potentially useful new handles with which to grab groups.

Now how will search and content processing benefit? For now, not too much. The problem is that innovations in math cannot be applied to most of today’s information processing systems. There are computational considerations, and there are other tasks which need more attention than the plumbing; namely, how can a vendor get the system to output information a licensee can actually use in real life.

I want to remind you, gentle reader, that the reason most search and content processing systems are very much alike has a simple explanation. Most are built using the same 10 components identified in the 2008 paper.

Consider that the next time you plunk down big money for a proprietary system. For most business tasks, open source solutions are substantially similar in core functionality without the hefty price tag for the license, bespoke engineering, and a development cycle more mysterious than the pronouncements of the oracle at Delphi.

Stephen E Arnold, June 23, 2015

Google: Is This an X Lab for Real Journalists

June 23, 2015

I have a colleague who retired. The newspaper for which he worked continued to make like interesting for those over the age of 55. I assume that other real journalists have discovered that the appetite for those born after 1950 is changing. Bring on the younger journalism grads. YouTube savvy? Great. A high traffic blog about veganism? Come on down. A Web site which is magnet for python programmers? Hey, want to work for us?

When I read “Introducing the News Lab,” I had two different thoughts:

  1. What a great idea
  2. Quite a pool of unemployed, under employed, and want to be professionals to tap
  3. How many publishers are like hungry bass in a big lake at a fishing tournament?
  4. How many journalists know how to make Google’s system sing and dance like a top billing at a vaudeville show?

According to the write up:

It’s hard to think of a more important source of information in the world than quality journalism. At its best, news communicates truth to power, keeps societies free and open, and leads to more informed decision-making by people and leaders. In the past decade, better technology and an open Internet have led to a revolution in how news is created, distributed, and consumed. And given Google’s mission to ensure quality information is accessible and useful everywhere, we want to help ensure that innovation in news leads to a more informed, more democratic world.

There you go. What about the right to be forgotten, filtering, predictive search results, and ads? Once again I am mashing up the math club’s manifesto with reality.

The idea is that the journalists embracing the GOOG will use the GOOG to produce content. I learned:

There’s a revolution in data journalism happening in newsrooms today, as more data sets and more tools for analysis are allowing journalists to create insights that were never before possible. To help journalists use our data to offer a unique window to the world, last week we announced an update to our Google Trends platform. The new Google Trends provides journalists with deeper, broader, and real-time data, and incorporates feedback we collected from newsrooms and data journalists around the world. We’re also helping newsrooms around the world tell stories using data, with a daily feed of curated Google Trends based on the headlines of the day, and through partnerships with newsrooms on specific data experiments.

The attentive reader will notice that I have removed the numerous links in the article. Clicking around in the middle of an important article is not something I do nor encourage.

Will the News Lab deliver the benefits journalists expect and the benefit some folks need? Will Google “put wood behind” this initiative or will it suffer the same fate as Web Accelerator? Will the service generate more magnetism than the many news efforts nosing into the datasphere? Will publishers jump with glee because Google empowers new content?

No answers yet.

Stephen E Arnold, June 23, 2015

New Analysis Tool for Hadoop Data from Oracle

June 23, 2015

Oracle offers new ways to analyze Hadoop data, we learn from the brief write-up, “Oracle Zeroes in on Hadoop Data with New Analytics Tool” at PCWorld. Use of the Hadoop open-source distributed file system continues to grow  among businesses and other organizations, so it is no surprise to see enterprise software giant Oracle developing such tools. This new software is dubbed Oracle Big Data Spatial and Graph. Writer Katherine Noyes reports:

“Users of Oracle’s database have long had access to spatial and graph analytics tools, which are used to uncover relationships and analyze data sets involving location. Aiming to tackle more diverse data sets and minimize the need for data movement, Oracle created the product to be able to process data natively on Hadoop and in parallel using MapReduce or in-memory structures.

“There are two main components. One is a distributed property graph with more than 35 high-performance, parallel, in-memory analytic functions. The other is a collection of spatial-analysis functions and services to evaluate data based on how near or far something is, whether it falls within a boundary or region, or to process and visualize geospatial data and imagery.”

The write-up notes that such analysis can reveal connections for organizations to capitalize upon, like relationships between customers or assets. The software is, of course, compatible with Oracle’s own Big Data Appliance platform, but can be deployed on other Hadoop and NoSQL systems, as well.

Cynthia Murrell, June 23, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Major SharePoint Features Disclosed

June 23, 2015

SharePoint Server 2016 has caused quite a stir, with users wondering what features will come through in the final version. At Microsoft Ignite last month, rumors turned to legitimate features. Read more about separating fact from fiction in the newest SharePoint release in the CIO article, “Top 4 Revelations about SharePoint.”

The article begins:

“Some of the biggest news to come out of Microsoft Ignite last month was the introduction and the first public demonstration of SharePoint Server 2016 – a demo that quelled a lot of speculation and uneasiness in the SharePoint administrator community. Here are the biggest takeaways from the conference, with an emphasis on the on-premises product.”

The article goes on to say that users can look forward to a full on-premises version, bolstered administrative features, four roles to divide the workload, and an emphasis on hybrid functions.  For users that need to stay in the loop with SharePoint updates and changes, stay tuned to ArnoldIT.com. Stephen E. Arnold is a longtime leader in search, and his Web site offers a unique SharePoint feed to keep all the latest tips, tricks, and news in one convenient location.

Emily Rae Aldridge, June 23, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

MIT Discover Object Recognition

June 23, 2015

MIT did not discover object recognition, but researchers did teach a deep-learning system designed to recognize and classify scenes can also be used to recognize individual objects.  Kurzweil describes the exciting development in the article, “MIT Deep-Learning System Autonomously Learns To Identify Objects.”  The MIT researchers realized that deep-learning could be used for object identification, when they were training a machine to identify scenes.  They complied a library of seven million entries categorized by scenes, when they learned that object recognition and scene-recognition had the possibility of working in tandem.

“ ‘Deep learning works very well, but it’s very hard to understand why it works — what is the internal representation that the network is building,’ says Antonio Torralba, an associate professor of computer science and engineering at MIT and a senior author on the new paper.”

When the deep-learning network was processing scenes, it was fifty percent accurate compared to a human’s eighty percent accuracy.  While the network was busy identifying scenes, at the same time it was learning how to recognize objects as well.  The researchers are still trying to work out the kinks in the deep-learning process and have decided to start over.  They are retraining their networks on the same data sets, but taking a new approach to see how scene and object recognition tie in together or if they go in different directions.

Deep-leaning networks have major ramifications, including the improvement for many industries.  However, will deep-learning be applied to basic search?  Image search still does not work well when you search by an actual image.

Whitney Grace, June 23, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Mobile Search: A Rising Tide Lifts Boats Which Are in the Water and Float

June 22, 2015

Data about mobile search are plentiful. Everyone from unemployed middle school teachers to failed webmasters telling real stories outputs mobile information. For a venture firm’s viewpoint, navigate to “Presentation: Mobile Is Eating the World.” There is a slide deck and a video for those too busy to read.

The assumption on which the presentation is built appears to be “Mobile is making technology universal.” The idea strikes me as one which the original AT&T formulated, but that just useful technical DNA. Humans communicate. Mobile phones do the job. Ergo: Mobile is a big deal.

One of the more interesting visuals in the presentation is a reminder to companies whose business model is built on traditional approaches to computing. I wonder if Google realizes that its ad-centric revenue is a product of the desktop computing era? Here’s the graphic I printed out and tucked into my “Doom Is Approaching” folder:

image

The data come from the mid tier consulting firm Gartner, but the nifty orange line suggests that mid tier consultants along with other traditional outfits may have to rejigger their approach to the brave new mobile world.

If you are looking for information to support a mobile search initiative, Andreesen Horowitz has just what you may need. The presentation does not explore the information access limitations of mobile ubiquity. My hunch is that no one cares. The information world fits nicely in a small display.

Stephen E Arnold, June 23, 2015

Non Government Statistical Atlas: Filling a Void

June 22, 2015

Short honk. Navigate to “Reviving the Statistical Atlas of the United States with New Data.” Nathan Yau, who bills himself as Superintendent of Flowing Data, produced a graphical statistical atlas. Believe me when I suggest that the US government probably do this type of report. As far as I know, the US government has not and might tie itself in knots trying to match his work. If you are interested in crops and education along with dozens of other data sets about the US, Dr. Yau’s work warrants your time. A happy quack to “some guy with a blog” for this atlas.

Stephen E Arnold, June 23, 2015

The Amazon Cloud: Complexity and a Chance for High Winds

June 22, 2015

Short honk: I read “AWS Deployment Tools: Choosing the Right Application Service.” The write up explains “three distinct services aimed at simplifying and automating project deployment and management.” The write up tackles in less than 500 words Elastic Beanstalk (not to be confused with Elastic, the search service which can be deployed on Amazon), CloudFormation (not to be confused with the other clouds or the weather oriented clouds), and OpsWork (no to be confused with government ops work). What I find interesting is that those who want to embrace Amazon’s cloud services may be surprised that the learning cost may be higher than the actual cost of Amazon’s cloud services. This is neither good nor bad. The complexity is a reminder that computing today is not necessarily easier, simpler, or more straightforward than it was in the days of the good old mainframe. IBM did provide customer support in the 1960s. You will have to determine how helpful Amazon’s technical support is when you fly to the Amazon cloud.

Stephen E Arnold, June 22, 2015

Publishers Want to Dejuice Apple, Squash It

June 22, 2015

I read “Publishers Slam Apple over Presumptuous News App Conditions.” Publishers presumptuous? I know of one publisher who used my research and marketed it on Amazon without my permission. Was that presumptuous of IDC and its wizard Dave Schubmehl?

According to the write up:

Publishers are up in arms following an email from Apple about inclusion in the firm’s upcoming News application and the kind of conditions that will be imposed. The email said that participants are presumed to have accepted Apple’s terms unless they explicitly opt out. It’s the old opt-out over opt-in thing.

Yes, up in arms. I can see the publishers at the New York Athletic Club wielding their squash rackets with malice. My goodness, what a chilling thought. What if those white clad clubsters were to descend on the Apple store in Manhattan and threaten the geniuses?

My fears subsided when I read:

The service will draw content from publicly available RSS feeds, and it is possible that Apple will be challenged, according to one expert, but not in any really meaningful way.

My concern for a Squash Assault receded. Publishers may have to retire to the Yacht Club to find another option.

Stephen E Arnold, June 22, 2015

Bing Search: Pump Up the Music

June 22, 2015

My approach to online research is to look for information. I know that I have looked up Mozart in the past. Viewed in the aggregate, I look for high technology companies, people involved in high technology, and products what embody technology. Music videos are not what makes my intellectual engines sing along.

I read “Bing Wants to Become the Search Engine of Choice for Music Videos.” Good for Bing. There are many Web pages which exist to be indexed. There are search challenges to resolve. There is the problem of Microsoft index silos. Have you done a query in Bing and wished that there were relevant links to content in Microsoft’s academic index? Well, too bad.

According to the write up:

The update adds larger thumbnails to Bing’s video section, which now displays additional information about each clip, like channel, upload date and view count.  Users also have the option to watch a preview of each clip within their search results, and explore related queries more easily.

The new Microsoft is definitely innovating in search aimed at those who are hungry for music videos. What’s the next innovation? Video games? Online horoscopes? Nail polish colors?

Stephen E Arnold, June 22, 2015

 

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta