A Xoogler Fixes Yahoo Mobile Search

June 27, 2015

If you have not explored Yahoo Search, give it a whirl. Try to find information about these topics:

The query “Yahoo Search: displays this result:

image

Note that the second hit is to Tumblr. There you go. The other hits point to the very same page I used to launch my search for “Yahoo Search.” Helpful?

Try this query: “price diapers”. On the left side of the results page, Yahoo displayed:

image

On the right side of the results page, Yahoo displayed:

image

These are prices from advertisers. Oh, there is a link to something called Yahoo Shopping. Okay, that is one way to generate revenue and create an extra click. Annoying to me. To Yahoo, fulfillment and joy.

Also, try this query: “Dark Web paste sites”.

Here’s the results page:

image

Ads and two links to Dot ONION addresses. For the Yahoo user, I am not sure if the user will know what to make of this result:

image

I suppose I can find some positives in these results pages. On the other hand, the impact for me was inconsistency.

Navigate now to “Yahoo Search Becomes More Like Google on Mobile Devices.” The headline tells the story. Yahoo is lost in search space, so the Xoogler running the Yahoo comedy hour is imitating Google.

So much for innovation. One hopes the approach works because when Yahoo is left to its own devices, the information access thing is a bit like a rice cake and water to a Big O tire changer taking a break from three hours of roadside work in the blazing sun.

Stephen E Arnold, June 27, 2015

Matchlight Lights Up Stolen Data

June 26, 2015

It is a common gimmick on crime shows for the computer expert to be able to locate information, often stolen data, by using a few clever hacking tricks.  In reality it is not that easy and quick to find stolen data, but eWeek posted an article about a new intelligence platform that might be able to do the trick: “Terbium Labs Launches Matchlight Data Intelligence Platform.”  Terbium Labs’ Matchlight is able to recover stolen data as soon as it is released on the Dark Web.

How it works is simply remarkable.  Matchlight attaches digital fingerprints to a company’s files, down to the smallest byte.  Data recovered on the Dark Web can then be matched to the Terbium Labs’s database.  Matchlight is available under a SaaS model.  Another option they have for clients is a one-way fingerprinting feature that keeps a company’s data private from Terbium Labs.  They would only have access to the digital fingerprints in order to track the data.  Matchlight can also be integrated into already existing SharePoint or other document management systems.  The entire approach to Matchlight is taking a protective stance towards data, rather than a defensive.

“We see the market shifting toward a risk management approach to information security,” [Danny Rogers, CEO and co-founder of Terbium} said. “Previously, information security was focused on IT and defensive technologies. These days, the most innovative companies are no longer asking if a data breach is going to happen, but when. In fact, the most innovative companies are asking what has already happened that they might not know about. This is where Matchlight provides a unique solution.”

Across the board, data breaches are becoming common and Matchlight offers an automated way to proactively protect data.  While the digital fingerprinting helps track down stolen data, does Terbium Labs have a way to prevent it from being stolen at all?

Whitney Grace, June 26, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Digital Reasoning a Self-Described Cognitive Computing Company

June 26, 2015

The article titled Spy Tools Come to the Cloud on Enterprise Tech shows how Amazon’s work with analytics companies on behalf of the government have realized platforms like “GovCloud”, with increased security. The presumed reason for such platforms being the gathering of intelligence and threat analysis on the big data scale. The article explains,

“The Digital Reasoning cognitive computing tool is designed to generate “knowledge graphs of connected objects” gleaned from structured and unstructured data. These “nodes” (profiles of persons or things of interest) and “edges” (the relationships between them) are graphed, “and then being able to take this and put it into time and space,” explained Bill DiPietro, vice president of product management at Digital Reasoning. The partners noted that the elastic computing capability… is allowing customers to bring together much larger datasets.”

For former CIA staff officer DiPietro it logically follows that bigger questions can be answered by the data with tools like the AWS GovCloud and subsequent Hadoop ecosystems. He cites the ability to quickly spotlight and identify someone on a watch list out of the haystack of people as the challenge set to overcome. They call it “cluster on demand,” the process that allows them to manage and bring together data.

Chelsea Kerwin, June 26,  2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Story Telling and Search: Smartlogic Fiction

June 25, 2015

One of my two or three readers sent me a link to an article appearing in the Smartlogic Web log. I found the write up unusual. You may want to check it out: Surviving without Content Intelligence? There’s an Elephant in the Room. The first chapter is here.

The approach is to tell a story which explains the value of Smartlogic’s content intelligence approach. I circled this passage in pale blue:

The OLAP cube and MDM solution he’s spent the first half of the year implementing [you can read about it here] is not going to help him with the emails, call records and file system data that he is being asked to include. He’d always known that 80% of an organization’s data was unstructured – he had hoped that they could get away with the 20% that was structured and easily managed. Now he’s got four times more data to work with, and he can’t just shovel it into the CRM system and hope they can deal with it.

The “read about it here” does not link to anything.

If the story resonates with you, Smartlogic may be exactly what you require.

The subhead “Next Week” includes this passage:

The Smartlogic Semaphore Search Application Framework is a tool for rapidly developing search applications that uniquely combine a Semantic Model with commodity tools such as SOLR and the Google Search Appliance, so users are not restricted to keywords, but can search by meaning as well which dramatically improves the user experience. Last, but not least, the Semaphore Classification Server would have allowed Archie to reliably link structured data and unstructured content without being dependent on existing structures and metadata; but that’s a story for next week.

I found one word fascinating, “commodity.” I think of the Google Search Appliance as an expensive way to process large volumes of content. The GSA no longer takes a one size fits all approach, but it is expensive to set up with fail over and customized functions. Solr is an open source solution perched on top of Lucene. A number of companies offer implementations of these open source products. The current stallion winning races is Elastic, but that is not a commodity like diapers.

The “story” is not complete. Part three will become available soon. Stay tuned.

Stephen E Arnold, June 25, 2015

How the Cloud Might Limit SharePoint Functionality

June 25, 2015

In the highly anticipated SharePoint Server 2016, on-premises, cloud, and hybrid functionality are all emphasized. However, some are beginning to wonder if functionality can suffer based on the variety of deployment chosen. Read all the details in the Search Content Management article, “How Does the Cloud Limit SharePoint Search and Integration?”
The article begins:
“All searches are not created equal, and tradeoffs remain for companies mulling deployment of the cloud, on-premises and hybrid versions of Microsoft’s collaboration platform, SharePoint. SharePoint on-premises has evolved over the years with a focus on customization and integration with other internal systems. That is not yet the case in the cloud with SharePoint Online, and there are still unique challenges for those who look to combine the two products with a hybrid approach.”
The article goes on to say that there are certain restrictions, especially with search customization, for the SharePoint Online deployment. Furthermore, a good amount of configuration is required to maximize search for the hybrid version. To keep up to date on how this might affect your organization, and the required workarounds, stay tuned to ArnoldIT.com. Stephen E. Arnold is longtime search professional, and his work on SharePoint is conveniently collocated in a dedicated feed to maximize efficiency.
Emily Rae Aldridge, June 25, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Twitter Gets a Search Facelift

June 25, 2015

Twitter has been experimenting with improving its search results and according to TechCrunch the upgrade comes via a new search results interface: “Twitter’s New Search Results Interface Expands To All Users.”  The new search results interface is the one of the largest updates Twitter has made in 2015.  It is supposed to increase the ease with a cleaner look and better filtering options.  Users will now be able to filter search results by live tweets, photos, videos, news, accounts, and more.

Twitter made the update to help people better understand how to use the message service and to take a more active approach to using it, rather than passively reading other peoples tweets.  The update is specifically targeted at new Twitter users.

The tweaked search interface will return tweets related to the search phrase or keyword, but that does not mean that the most popular tweets are returned:

“In some cases, the top search result isn’t necessarily the one with the higher metrics associated with it – but one that better matches what Twitter believes to be the searcher’s “intent.” For example, a search for “Steve Jobs” first displays a heavily-retweeted article about the movie’s trailer, but a search for “Mad Men” instead first displays a more relevant tweet ahead of the heavily-favorited “Mad Men” mention by singer Lorde.”

The new interface proves to be simpler and better list trends, related users, and news.  It does take a little while to finesse Twitter, which is a daunting task to new users.  Twitter is not the most popular social network these day and it’s using these updates to increase its appeal.

Whitney Grace, June 25, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Old Wine: Semantic Search from the Enlightenment

June 24, 2015

I read a weird disclaimer. Here it is:

This is an archived version of Pandia’s original article “Top 5 Semantic Search Engines”, we made it available to the users mainly because it is still among the most sought articles from old site. You can also check kids, radio search, news, people finder and q-cards sections.

An article from the defunct search newsletter Pandia surfaced in a news aggregation list. Pandia published one of my books, but at the moment I cannot remember which of my studies.

The write up identifies “semantic search engines.” Here’s the list with my status update in bold face:

  • Hakia. Out of business
  • SenseBot. Out of business.
  • Powerset. Bought by Microsoft. Fate unknown in the new Delve/Bing world.
  • DeepDyve. Talk about semantics but the system is a variation of the Dialog/BRS for fee search model from the late 1970s.
  • Cognition (Cognition Technologies). May be a unit of Nuance?

What’s the score?

Two failures. Two sales to another company. One survivor which has an old school business model. My take? Zero significant impact on information retrieval.

Feel free to disagree, but the promise of semantic search seems to pivot on finding a buyer and surviving by selling online research. Why so much semantic cheerleading? Beats me. Semantic methods are useful in the plumbing as a component of a richer, more robust system. Most cyberOSINT systems follow this path. Users don’t care too much about plumbing in my experience.

Stephen E Arnold, June 24, 2015

Deep Learning System Surprises Researchers

June 24, 2015

Researchers were surprised when their scene-classification AI performed some independent study, we learn from Kurzweil’s article, “MIT Deep-Learning System Autonomously Learns to Identify Objects.”

At last December’s International Conference on Learning Representations, a research team from MIT demonstrated that their scene-recognition software was 25-33 percent more accurate than its leading predecessor. They also presented a paper describing the object-identification tactic their software chose to adopt; perhaps this is what gave it the edge. The paper’s lead author, and MIT computer science/ engineering associate professor, Antonio Torralba ponders the development:

“Deep learning works very well, but it’s very hard to understand why it works — what is the internal representation that the network is building. It could be that the representations for scenes are parts of scenes that don’t make any sense, like corners or pieces of objects. But it could be that it’s objects: To know that something is a bedroom, you need to see the bed; to know that something is a conference room, you need to see a table and chairs. That’s what we found, that the network is really finding these objects.”

Researchers being researchers, the team is investigating their own software’s initiative. The article tells us:

“In ongoing work, the researchers are starting from scratch and retraining their network on the same data sets, to see if it consistently converges on the same objects, or whether it can randomly evolve in different directions that still produce good predictions. They’re also exploring whether object detection and scene detection can feed back into each other, to improve the performance of both. ‘But we want to do that in a way that doesn’t force the network to do something that it doesn’t want to do,’ Torralba says.”

Very respectful. See the article for a few more details on this ambitious AI, or check out the researchers’ open-access paper here.

Cynthia Murrell, June 24, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Search and Math: Stuck in a Rut?

June 23, 2015

I read a paper several years ago. Okay, maybe it was in 2008. You can find the write up here. There is a more recent review of the information in that “Top 10 Algorithms in Data Mining” article here. To make a long story short, most of the search and content processing systems use these tried-and-true methods. Most of them can be implemented using the guidelines in computer science textbooks, and there are plenty of examples to ensure that none of the search and content processing systems fall prey to the Big O issue.

Against this background, I read with interest “The Top 10 Mathematical Achievements of the Last 5ish Years.” I like the specificity of “5ish.” Good math thinking in today’s fuzzy algorithm environment.

The idea is that the write reviews math which sticks up like mountain tops above the cloud layer in the Peruvian Andes. Three of the 10 items snagged my interest, which is skewed by my bias toward search and content processing. Here are the three I highlighted from the 10 in the useful write up:

  • The bound gaps between primes. Perhaps the approach will benefit those engaged in making and making cryptography in the next year or so?
  • Voevodsky’s Homotopy Type Theory., How can one go wrong with new thoughts on fundamental math.
  • Work on The Fundamental Lemma. Gimme some old time group/set religion with potentially useful new handles with which to grab groups.

Now how will search and content processing benefit? For now, not too much. The problem is that innovations in math cannot be applied to most of today’s information processing systems. There are computational considerations, and there are other tasks which need more attention than the plumbing; namely, how can a vendor get the system to output information a licensee can actually use in real life.

I want to remind you, gentle reader, that the reason most search and content processing systems are very much alike has a simple explanation. Most are built using the same 10 components identified in the 2008 paper.

Consider that the next time you plunk down big money for a proprietary system. For most business tasks, open source solutions are substantially similar in core functionality without the hefty price tag for the license, bespoke engineering, and a development cycle more mysterious than the pronouncements of the oracle at Delphi.

Stephen E Arnold, June 23, 2015

MIT Discover Object Recognition

June 23, 2015

MIT did not discover object recognition, but researchers did teach a deep-learning system designed to recognize and classify scenes can also be used to recognize individual objects.  Kurzweil describes the exciting development in the article, “MIT Deep-Learning System Autonomously Learns To Identify Objects.”  The MIT researchers realized that deep-learning could be used for object identification, when they were training a machine to identify scenes.  They complied a library of seven million entries categorized by scenes, when they learned that object recognition and scene-recognition had the possibility of working in tandem.

“ ‘Deep learning works very well, but it’s very hard to understand why it works — what is the internal representation that the network is building,’ says Antonio Torralba, an associate professor of computer science and engineering at MIT and a senior author on the new paper.”

When the deep-learning network was processing scenes, it was fifty percent accurate compared to a human’s eighty percent accuracy.  While the network was busy identifying scenes, at the same time it was learning how to recognize objects as well.  The researchers are still trying to work out the kinks in the deep-learning process and have decided to start over.  They are retraining their networks on the same data sets, but taking a new approach to see how scene and object recognition tie in together or if they go in different directions.

Deep-leaning networks have major ramifications, including the improvement for many industries.  However, will deep-learning be applied to basic search?  Image search still does not work well when you search by an actual image.

Whitney Grace, June 23, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta