Honkin' News banner

Google Enables Users to Delete Search History, Piece by Piece

August 31, 2016

The article on CIO titled Google Quietly Brings Forgetting to the U.S. draws attention to Google have enabled Americans to view and edit their search history. Simply visit My Activity and login to witness the mind-boggling amount of data Google has collected in your search career. To delete, all you have to do is complete two clicks. But the article points out that to delete a lot of searches, you will need an afternoon dedicated to cleaning up your history. And afterward you might find that your searches are less customized, as are your ads and autofills. But the article emphasizes a more communal concern,

There’s something else to consider here, though, and this has societal implications. Google’s forget policy has some key right-to-know overlaps with its takedown policy. The takedown policy allows people to request that stories about or images of them be removed from the database. The forget policy allows the user to decide on his own to delete something…I like being able to edit my history, but I am painfully aware that allowing the worst among us to do the same can have undesired consequences.

Of course, by “the worse among us” he means terrorists. But for many people, the right to privacy is more important than the hypothetical ways that terrorists will potentially suffer within a more totalitarian, Big Brother state. Indeed, Google’s claim that the search history information is entirely private is already suspect. If Google personnel or Google partners can see this data, doesn’t that mean it is no longer private?

Chelsea Kerwin, August 31, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

The Equivalent of a Brexit

August 31, 2016

Britain’s historical vote to leave the European Union has set a historical precedent.  What is the precedent however?  Is it the choice to leave an organization?  The choice to maintain their independence?  Or is it a basic example of the right to choose?  The Brexit will be used as a metaphor for any major upheaval for the next century, so how can it be used in technology context?  BA Insight gives us the answer with “Would Your Users Vote ‘Yes’ For Sharexit?”

SharePoint is Microsoft Office’s collaborative content management program.  It can be used to organize projects, build Web sites, store files, and allow team members to communicate.  Office workers also spurn it across the globe over due to its inefficiencies.  To avoid a Sharexit in your organization, the article offers several ways to improve a user’s SharePoint experience.  One of the easiest ways to keep SharePoint is to build an individual user interface that handles little tasks to make a user’s life easier.  Personalizing the individual SharePoint user experience is another method, so the end user does not feel like another cog in the system but rather that SharePoint was designed for them.  Two other suggestions are plain, simple advice: take user feedback and actually use it and make SharePoint the go information center for the organization by putting everything on it.

Perhaps the best advice is making information easy to find on SharePoint:

Documents are over here, discussions over there, people are that way, and then I don’t know who the experts really are.  You can make your Intranet a whole lot smarter, or dare we say “intelligent”, if you take advantage of this information in an integrated fashion, exposing your users to connected, but different, information.  You can connect documents to the person who wrote them, then to that person’s expertise and connected colleagues, enabling search for your hidden experts. The ones that can really be helpful often reduce chances for misinformation, repetition of work, or errors. To do this, expertise location capabilities can combine contributed expertise with stated expertise, allowing for easy searching and expert identification.

Developers love SharePoint because it is easy to manage and to roll out information or software to every user.  End users hate it because it creates more problems than resolving anything.  If developers take the time to listen to what the end users need from their SharePoint experience than can avoid an Sharexit.

Whitney Grace, August 31, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Smart Software Pitfalls: A List-Tickle

August 26, 2016

Need page views? Why not try a listicle or, as we say here in Harrod’s Creek, a “list-tickle.”

In order to understand the depth of thought behind “13 Ways Machine Learning Can Steer You Wrong,” one must click 13 times. I wonder if the managers responsible for this PowerPoint approach to analysis handed in their college work on 5X8 inch note cards and required that the teacher ask for each individually.

What are the ways machine learning can steer one into a ditch? As Ms. Browning said in a single poem on one sheet of paper, “Let me count the ways.”

  1. The predictions output by the Fancy Dan system are incorrect. Fancy that.
  2. One does not know what one does not know. This reminds me of a Donald Henry Rumsfeld koan. I love it when real journalists channel the Rumsfeld line of thinking.
  3. Algorithms are not in line with reality. Mathematicians and programmers are reality. What could these folks do that does not match the Pabst Blue Ribbon beer crowd at a football game? Answer: Generate useless data unrelated to the game and inebriated fans.
  4. Biased algorithms. As I pointed out in this week’s HonkinNews, numbers are neutral. Humans, eh, not often.
  5. Bad hires. There you go. Those LinkedIn expertise graphs can be misleading.
  6. Cost lots of money. Most information technology projects cost a lot of money even when they are sort of right. When they are sort of wrong, one gets Hewlett Packard-Autonomy like deals.
  7. False assumptions. My hunch is that this is Number Two wearing lipstick.
  8. Recommendations unrelated to the business problem at hand. This is essentially Number One with a new pair of thrift store sneakers.
  9. Click an icon, get an answer. The Greek oracles required supplicants to sleep off a heady mixture of wine and herbs in a stone room. Now one clicks an icon when one is infused with a Starbuck’s tall, no fat latte with caramel syrup.
  10. GIGO or garbage in, garbage out. Yep, that’s what happens when one cuts the statistics class when the professor talks about data validity.
  11. Looking for answers the data cannot deliver. See Number Five.
  12. Wonky outcomes. Hey, the real journalist is now dressing a Chihuahua in discarded ice skating attire.
  13. “Blind Faith.” Isn’t this a rock and roll band. When someone has been using computing devices since the person was four years old, that person is an expert and darned sure the computer speaks the truth like those Greek oracles.

Was I more informed after clicking 13 times? Nope.

Stephen E Arnold, August 26, 2016

Real Time Data Analysis for Almost Anyone

August 25, 2016

The idea that Everyman can tap into a real time data stream and perform “analyses” is like catnip for some. The concept appeals to those in the financial sector, but these folks often have money (yours and mine) to burn. The idea seems to snag the attention of some folks in the intelligence sector who want to “make sense” out of Twitter streams and similar flows of “social media.” In my experience, big outfits with a need to tap into data streams have motivation and resources. Most of those who fit into my pigeonhole have their own vendors, systems, and methods in place.

The question is, “Does Tom’s Trucking need to tap into real time data flows to make decisions about what paint to stock or what marketing pitch to use on the business card taped to the local grocery’s announcement board?”

I plucked from my almost real time Web information service (Overflight) two articles suggesting that there is money in “them thar hills” of data.

The first is “New Amazon Service Uses SQL To Query Streaming Big Data.” Amazon is a leader in the cloud space. The company may not be number one on the Gartner hit parade, but some of those with whom I converse believe that Amazon continues to be the cloud vendor to consider and maybe use. The digital Wal-Mart has demonstrated both revenue and innovation with its cloud business.

The article explains that Amazon has picked  up the threads of Hadoop, SQL, and assorted enabling technologies and woven Amazon Kinesis Analytics. The idea is that Amazon delivers a piping hot Big Data pizza via a SQL query. The write up quotes an Amazon wizard as saying:

“Being able to continuously query and gain insights from this information in real-time — as it arrives — can allow companies to respond more quickly to business and customer needs,” AWS said in a statement. “However, existing data processing and analytics solutions aren’t able to continuously process this ‘fast moving’ data, so customers have had to develop streaming data processing applications — which can take months to build and fine-tune — and invest in infrastructure to handle high-speed, high-volume data streams that might include tens of millions of events per hour.”

Additional details appear in Amazon’s blog post here. The idea is that anyone with some knowledge of things Amazon, coding expertise, and a Big Data stream can use the Amazon service.

The second write up is “Microsoft Power BI Dashboards Deliver Real Time Data.” The idea seems to be that Microsoft is in the real time data analysis poker game as well. The write up reveals:

Power BI’s real-time dashboards — known as Real-Time Dashboard tiles — builds on the earlier Power BI REST APIs release to create real-time tiles within minutes. The tiles push data to the Power BI REST APIs from streams of data created in PubNub, a real-time data streaming service currently used widely for building web, mobile and IoT applications.

The idea is that a person knows the Microsoft methods, codes the Microsoft way, and has a stream of Big Data. The user then examines the outputs via “tiles.” These are updated in real time. As mentioned above, Microsoft is the Big Data Big Dog in the Gartner kennel. Obviously Microsoft will be price competitive with the service prices at about $10 per month. The original price was about $40 a month, but the cost cutting fever is raging in Redmond.

The question is, “Which of these services will dominate?” Who knows? Amazon has a business and a real time pitch which makes sense to those who have come to depend on the AWS services. Microsoft has business customers, Windows 10, and a reseller/consulting community eager to generate revenue.

My thought is, “Pick your horse, put down your bet, and head to the Real Time Data Analytics race track.” Tomorrow’s $100 ticket is only a few bucks today. The race to low cost entry fees is about to begin.

Stephen E Arnold, August 25, 2016

Can Analytics Be Cloud Friendly?

August 24, 2016

One of the problems with storing data in the cloud is that it is difficult to run analytics.  Sure, you can run tests to determine the usage of the cloud, but analyzing the data stored in the cloud is another story.  Program developers have been trying to find a solution to this problem and the open source community has developed some software that might be the ticket.  Ideata wrote about the newest Apache software in “Apache Spark-Comparing RDD, Dataframe, and Dataset.”

Ideata is a data software company and they built many of the headlining products on the open source software Apache Spark.  They have been using Apache Spark since 2013 and enjoy using it because it offers a rich abstraction, allows the developer to build complex workflows, and perform easy data analysis.

Apache Spark works like this:

Spark revolves around the concept of a resilient distributed dataset (RDD), which is a fault-tolerant collection of elements that can be operated on in parallel. An RDD is Spark’s representation of a set of data, spread across multiple machines in the cluster, with API to let you act on it. An RDD could come from any datasource, e.g. text files, a database via JDBC, etc. and can easily handle data with no predefined structure.

It can be used as the basis fort a user-friendly cloud analytics platform, especially if you are familiar with what can go wrong with a dataset.

Whitney Grace, August 24, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Technology That Literally Can Read Your Lips (Coming Soon)

August 19, 2016

The article on Inquisitr titled Emerging New “Lip-Reading” Technology To Radically Revolutionize Modern-Day Crime Solving explains the advances in visual speech recognition technology. In 1974 Gene Hackman could have used this technology in the classic film “The Conversation” where he plays a surveillance expert trying to get better audio surveillance in public settings where background noise makes clarity almost impossible. Apparently, we haven’t come very far since the 70s when it comes to audio speech recognition, but recent strides in lip reading technology in Norwich have experts excited. The article states,

“Lip-reading is one of the most challenging problems in artificial intelligence so it’s great to make progress on one of the trickier aspects, which is how to train machines to recognize the appearance and shape of human lips.” A few years ago German researchers at the Karlsruhe Institute of Technology claim they’ve introduced a lip-reading phone that allowed for soundless communication, a development that was to mark a massive leap forward into the future of speech technology.”
The article concludes that while progress has been made, there is still a great deal of ground to cover. The complications inherent in recognizing, isolating, and classifying lip movement patterns makes this work even more difficult than audio speech recognition, according to the article. At any rate, this is good news for some folks who want to “know” what is in a picture and what people say when there is no audio track.

The article concludes that while progress has been made, there is still a great deal of ground to cover. The complications inherent in recognizing, isolating, and classifying lip movement patterns makes this work even more difficult than audio speech recognition, according to the article. At any rate, this is good news for some folks who want to “know” what is in a picture and what people say when there is no audio track.


Chelsea Kerwin, August 19, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

There is a Louisville, Kentucky Hidden /Dark Web meet up on August 23, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233019199/

HonkinNews for August 16, 2016

August 16, 2016

The weekly news program about search, online, and content processing is now available at https://youtu.be/mE3MGlmrUWc. In addition to comments about Goo!Hoo, IBM, and Microsoft, you will learn about grilling squirrel over a wood fire. Live from Harrod’s Creek.

Stephen E Arnold, August 16, 2016

IBM’s Champion Human Resources Department Announces “Permanent” Layoff Tactics

August 16, 2016

The article on Business Insider titled Leaked IBM Email Says Cutting “Redundant” Jobs Is a “Permanent and Ongoing” Part of Its Business Model explores the language and overall human resource strategy of IBM. Netherland IBM personnel learned in the email that layoffs are coming, but also that layoffs will be a regular aspect of how IBM “optimizes” their workforce. The article tells us,

“IBM isn’t new to layoffs, although these are the first to affect the Netherlands. IBM’s troubled business units, like its global technology services unit, are shrinking faster than its booming businesses, like its big data/analytics, machine learning (aka Watson), and digital advertising agency are growing…All told, IBM eliminated and gained jobs in about equal numbers last year, it said. It added about 70,000 jobs, CEO Rometty said, and cut about that number, too.”

IBM seems to be performing a balancing act that involves gaining personnel in areas like data analytics while shedding employees in other areas that are less successful, or “redundant.” This allows them to break even, although the employees that they fire might feel that Watson itself could have delivered the news more gracefully and with more tact than the IBM HR department did. At any rate, we assume that IBM’s senior management asked Watson what to do and that this permanent layoffs strategy was the informed answer provided by the supercomputer.


Chelsea Kerwin, August 16, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

There is a Louisville, Kentucky Hidden /Dark Web meet up on August 23, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233019199/



Yippy Revealed: An Interview with Michael Cizmar, Head of Enterprise Search Division

August 16, 2016

In an exclusive interview, Yippy’s head of enterprise search reveals that Yippy launched an enterprise search technology that Google Search Appliance users are converting to now that Google is sunsetting its GSA products.

Yippy also has its sights targeting the rest of the high-growth market for cloud-based enterprise search. Not familiar with Yippy, its IBM tie up, and its implementation of the Velocity search and clustering technology? Yippy’s Michael Cizmar gives some insight into this company’s search-and-retrieval vision.

Yippy ((OTC PINK:YIPI) is a publicly-trade company providing search, content processing, and engineering services. The company’s catchphrase is, “Welcome to your data.”

The core technology is the Velocity system, developed by Carnegie Mellon computer scientists. When IBM purchased Vivisimio, Yippy had already obtained rights to the Velocity technology prior to the IBM acquisition of Vivisimo. I learned from my interview with Mr. Cizmar that IBM is one of the largest shareholders in Yippy. Other facets of the deal included some IBM Watson technology.

This year (2016) Yippy purchased one of the most recognized firms supporting the now-discontinued Google Search Appliance. Yippy has been tallying important accounts and expanding its service array.


John Cizmar, Yippy’s senior manager for enterprise search

Beyond Search interviewed Michael Cizmar, the head of Yippy’s enterprise search division. Cizmar found MC+A and built a thriving business around the Google Search Appliance. Google stepped away from on premises hardware, and Yippy seized the opportunity to bolster its expanding business.

I spoke with Cizmar on August 15, 2016. The interview revealed a number of little known facts about a company which is gaining success in the enterprise information market.

Cizmar told me that when the Google Search Appliance was discontinued, he realized that the Yippy technology could fill the void and offer more effective enterprise findability.  He said, “When Yippy and I began to talk about Google’s abandoning the GSA, I realized that by teaming up with Yippy, we could fill the void left by Google, and in fact, we could surpass Google’s capabilities.”

Cizmar described the advantages of the Yippy approach to enterprise search this way:

We have an enterprise-proven search core. The Vivisimo engineers leapfrogged the technology dating from the 1990s which forms much of Autonomy IDOL, Endeca, and even Google’s search. We have the connector libraries THAT WE ACQUIRED FROM MUSE GLOBAL. We have used the security experience gained via the Google Search Appliance deployments and integration projects to give Yippy what we call “field level security.” Users see only the part of content they are authorized to view. Also, we have methodologies and processes to allow quick, hassle-free deployments in commercial enterprises to permit public access, private access, and hybrid or mixed system access situations.

With the buzz about open source, I wanted to know where Yippy fit into the world of Lucene, Solr, and the other enterprise software solutions. Cizmar said:

I think the customers are looking for vendors who can meet their needs, particularly with security and smooth deployment. In a couple of years, most search vendors will be using an approach similar to ours. Right now, however, I think we have an advantage because we can perform the work directly….Open source search systems do not have Yippy-like content intake or content ingestion frameworks. Importing text or an Oracle table is easy. Acquiring large volumes of diverse content continues to be an issue for many search and content processing systems…. Most competitors are beginning to offer cloud solutions. We have cloud options for our services. A customer picks an approach, and we have the mechanism in place to deploy in a matter of a day or two.

Connecting to different types of content is a priority at Yippy. Even through the company has a wide array of import filters and content processing components, Cizmar revealed that Yippy is “enhanced the company’s connector framework.”

I remarked that most search vendors do not have a framework, relying instead on expensive components licensed from vendors such as Oracle and Salesforce. He smiled and said, “Yes, a framework, not a widget.”

Cizmar emphasized that the Yippy IBM Google connections were important to many of the company’s customers plus we have also acquired the Muse Global connectors and the ability to build connectors on the fly. He observed:

Nobody else has Watson Explorer powering the search, and nobody else has the Google Innovation Partner of the Year deploying the search. Everybody tries to do it. We are actually doing it.

Cizmar made an interesting side observation. He suggested that Internet search needed to be better. Is indexing the entire Internet in Yippy’s future? Cizmar smiled. He told me:

Yippy has a clear blueprint for becoming a leader in cloud computing technology.

For the full text of the interview with Yippy’s head of enterprise search, Michael Cizmar, navigate to the complete Search Wizards Speak interview. Information about Yippy is available at http://yippyinc.com/.

Stephen E Arnold, August 16, 2016

Mixpanel Essay Contains Several Smart Software Gems

August 11, 2016

I read “The Hard Thing about Machine Learning.” The essay explains the history of machine learning at Mixpanel. Mixpanel is a business analytics company. Embedded in the write up are several observations which I thought warranted highlighting.

The first point is the blunt reminded that machine learning requires humans—typically humans with specialist skills—to make smart software work as expected. The humans have to figure out what problem they and the numerical recipes are supposed to solve.  Mixpanel says:

machine learning isn’t some sentient robot that does this all on its own. Behind every good machine learning model is a team of engineers that took a long thoughtful look at the problem and crafted the right model that will get better at solving the problem the more it encounters it. And finding that problem and crafting the right model is what makes machine learning really hard.

The second pink circle in my copy of the essay corralled this observation:

The broader the problem, the more universal the model needs to be. But the more universal the model, the less accurate it is for each particular instance. The hard part of machine learning is thinking about a problem critically, crafting a model to solve the problem, finding how that model breaks, and then updating it to work better. A universal model can’t do that.

I think this means that machine learning works on quite narrow, generally tidy problems. Anyone who has worked with the mid 1990s Autonomy IDOL system knows that as content flows into a properly trained system, that “properly trained” system can start to throw some imprecise and off-point outputs. The fix is to retrain the system on a properly narrowed data set. Failure to do this would cause users to scratch their heads because they could not figure out how their query about computer terminals generated outputs about railroad costs. The key is the word “terminal” and increasingly diverse content flowing into the system.

The third point received a check mark from this intrepid reader:

Correlation does not imply causation.

Interesting. I think one of my college professors in 1962 made a similar statement. Pricing for Mixpanel begins at $600 per month for four million data points.

Stephen E Arnold, August 11, 2016

« Previous PageNext Page »