Stanford Offers Course Overviewing Roots of the Google Algorithm

March 23, 2016

The course syllabus for Stanford’s Computer Science class titled CS 349: Data Mining, Search, and the World Wide Web on Stanford.edu provides an overview of some of the technologies and advances that led to Google search. The syllabus states,

“There has been a close collaboration between the Data Mining Group (MIDAS) and the Digital Libraries Group at Stanford in the area of Web research. It has culminated in the WebBase project whose aims are to maintain a local copy of the World Wide Web (or at least a substantial portion thereof) and to use it as a research tool for information retrieval, data mining, and other applications. This has led to the development of the PageRank algorithm, the Google search engine…”

The syllabus alone offers some extremely useful insights that could help students and laypeople understand the roots of Google search. Key inclusions are the Digital Equipment Corporation (DEC) and PageRank, the algorithm named for Larry Page that enabled Google to become Google. The algorithm ranks web pages based on how many other websites link to them. John Kleinburg also played a key role by realizing that websites with lots of links (like a search engine) should also be seen as more important. The larger context of the course is data mining and information retrieval.

 

Chelsea Kerwin, March 23, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Interview with Stephen E Arnold, Reveals Insights about Content Processing

March 22, 2016

Nikola Danaylov of the Singularity Weblog interviewed technology and financial analyst Stephen E. Arnold on the latest episode of his podcast, Singularity 1 on 1. The interview, Stephen E. Arnold on Search Engines and Intelligence Gathering, offers thought-provoking ideas on important topics related to sectors — such as intelligence, enterprise search, and financial — which use indexing and content processing methods Arnold has worked with for over 50 years.

Arnold attributes the origins of his interest in technology to a programming challenge he sought and accepted from a computer science professor, outside of the realm of his college major of English. His focus on creating actionable software and his affinity for problem-solving of any nature led him to leave PhD work for a job with Halliburton Nuclear. His career includes employment at Booz, Allen & Hamilton, the Courier Journal & Louisville Times, and Ziff Communications, before starting ArnoldIT.com strategic information services in 1991. He co-founded and sold a search system to Lycos, Inc., worked with numerous organizations including several intelligence and enforcement organizations such as US Senate Police and General Services Administration, and authored seven books and monographs on search related topics.

With a continued emphasis on search technologies, Arnold began his blog, Beyond Search, in 2008 aiming to provide an independent source of “information about what I think are problems or misstatements related to online search and content processing.” Speaking to the relevance of the blog to his current interest in the intelligence sector of search, he asserts:

“Finding information is the core of the intelligence process. It’s absolutely essential to understand answering questions on point and so someone can do the job and that’s been the theme of Beyond Search.”

As Danaylov notes, the concept of search encompasses several areas where information discovery is key for one audience or another, whether counter-terrorism, commercial, or other purposes. Arnold agrees,

“It’s exactly the same as what the professor wanted to do in 1962. He had a collection  of Latin sermons. The only way to find anything was to look at sermons on microfilm. Whether it is cell phone intercepts, geospatial data, processing YouTube videos uploaded from a specific IP address– exactly the same problem and process. The difficulty that exists is that today we need to process data in a range of file types and at much higher speeds than ever anticipated, but the processes remain the same.”

Arnold explains the iterative nature of his work:

“The proof of the value of the legacy is I don’t really do anything new, I just keep following these themes. The Dark Web Notebook is very logical. This is a new content domain. And if you’re an intelligence or information professional, you want to know, how do you make headway in that space.”

Describing his most recent book, Dark Web Notebook, Arnold calls it “a cookbook for an investigator to access information on the Dark Web.” This monograph includes profiles of little-known firms which perform high-value Dark Web indexing and follows a book he authored in 2015 called CYBEROSINT: Next Generation Information Access.

Read more

Change Is Hard, Especially in the User Interface

March 22, 2016

One of the most annoying things in life is when you go to the grocery store and notice they have rearranged the entire place since your last visit.  I always ask myself the question, “Why grocery store people did you do this to me?”  Part of the reason is to improve the shopping experience and product exposure, while the other half is to screw with customers (I cannot confirm the latter).  According to the Fuzzy Notepad with its Pokémon Evee mascot the post titled “We Have Always Been At War With UI” explains that programmers and users have always been at war with each other when it comes to the user interface.

Face it, Web sites (and other areas of life) need to change to maintain their relevancy.  The biggest problem related to UI changes is the roll out of said changes.  The post points out that users get confused and spend hours trying to understand the change.  Sometimes the change is announced, other times it is only applied to a certain number of users.

The post lists several changes to UI and how they were handled, describing how they were handled and also the programming.  One constant thread runs through the post is that users simply hate change, but the inevitable question of, “Why?” pops up.

“Ah, but why? I think too many developers trot this line out as an excuse to ignore all criticism of a change, which is very unhealthy. Complaints will always taper off over time, but that doesn’t mean people are happy, just that they’ve gone hoarse. Or, worse, they’ve quietly left, and your graphs won’t tell you why. People aren’t like computers and may not react instantly to change; they may stew for a while and drift away, or they may join a mass exodus when a suitable replacement comes along.”

Big data can measure anything and everything, but the data can be interpreted for or against the changes.  Even worse is that the analysts may not know what exactly they need to measure.  What can be done to avoid total confusion about changes is to have a plan, let users know in advance, and even create tutorial about how to use the changes.  Worse comes to worse, it can be changed back and then we move on.

 

Whitney Grace, March 22, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

For Sale: Your Bank Information

March 21, 2016

One of the common commodities for sale on the Dark Web is bank, credit card, social security numbers, and other personal information.  This information can sell for a few bucks to hundreds of dollars depending on the quality and quantity of the information.   In order to buy personal information, usually the interested parties must journey to the Dark Web, but the International Business Times tells us that “Confidential Bank Details Available For Sale On Easily Found Web Site”  is for sale on the general Web and the information is being sold for as little as a couple pounds (or dollars for the US folks).  The Web site had a pretty simple set up, interested parties register, and then they have access to the stolen information for sale.

Keith Vaz, chairman of the home affairs select committee, wants the National Crime Agency (NCA) to use its power and fulfill its purpose to shut the Web site down.

“A statement from the NCA said: “We do not routinely confirm or deny investigations nor comment on individual sites. The NCA, alongside UK and international law enforcement partners and the private sector, are working to identify and as appropriate disrupt websites selling compromised card data. We will work closely with partners of the newly established Home Office Joint Fraud Task Force to strengthen the response.”

Online scams are getting worse and more powerful in stealing people’s information.  Overall, British citizens lost a total of 670 million pounds (or $972 million).  The government, however, believes the total losses are more in the range of 27 billion pounds (or $39.17 billion).

Scams are getting worse, because the criminals behind them are getting smarter and know how to get around security defenses.  Users need to wise up and learn about the Dark Web, take better steps to protect their information, and educate themselves on how to recognize scams.

 

Whitney Grace, March 21, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

The New Google? Instagram

March 19, 2016

I read “Google is Like for Oldies. Instagram Is the New Google” The source of this insight is Alia Bhatt, a person associated with Bollywood. The write up states:

Being in her early 20s, Alia Bhatt though may not be the most intellectual actor on earth, but she is definitely an actress who has earned professional success and fans. Being considered as one of the youth icons, Alia is also extremely fond of social media networking sites and obviously she is a frequenter on Instagram.

The write up adds:

Alia Bhatt, who is a self-confessed Instagram lover, has over 5.3 million followers and her account is filled with pictures from her film’s promotions, magazine covers to dubsmash videos to selfies with friends and lots more on food and her love for pets. As her co-star of Kapoor & Sons Sidharth Malhotra maintains, this generation creates their own new world within their smartphones. “It’s a generational thing. People our age are always on the phone – Instagraming, Tweeting, Whataspping and because we are in this world we have to cater it,” he added.

Google, Instagram is the new you, just without the balloons, the self driving autos, and the solving death stuff. Alert your AI systems, please, to the new lingo like “instagraming.”

Stephen E Arnold, March 19, 2016

Artificial Intelligence Fun: The Amazon Speech Recognition Function

March 18, 2016

I read “Amazon’s Alexa Went Bonkers, Reset User’s Thermostat.” Alexa is an Amazon smart product. The idea is that one talks to it in order to perform certain home automation tasks. Hey, it is tough to punch the button on a stereo system. Folks are really busy these days.

According to the write up:

one of the things Alexa apparently cannot do quite so well is determine who her master is. During a recent NPR broadcast about Alexa and the Echo, listeners at home noticed strange activity on their own Echo devices. Any time the radio reporter gave an example of an Alexa command, several Alexas across the country pricked up their ears and leapt into action — with surprising results.

There you go. A smart device which is unable to figure out which human voice to obey.

Here is one of the examples cited in the write up:

“Listener Roy Hagar wrote in to say our story prompted his Alexa to reset his thermostat to 70 degrees,”wrote NPR on a blog recounting the tale.

Smart devices with intelligence do not—I repeat—run into objects nor do they change thermostat settings. Humans are at fault. When one uses a next generation search system to identify the location of a bad actor, nothing will go wrong.

Stephen E Arnold, March 18, 2016

Google Decides to Be Nice to

March 18, 2016

Google is a renowned company for its technological endeavors, beautiful office campuses, smart employees, and how it is a company full of self-absorbed and competitive people.  While Google might have a lot of perks, it also has its dark side.  According to Quartz, Google wanted to build a more productive team so they launched Project Aristotle to analyze how and they found, “After Years Of Intensive Analysis, Google Discovers The Key To Good Teamwork Is being Nice.”

Project Aristotle studied hundreds of employees in different departments and analyzed their data.  They wanted to find a “magic formula,” but it all beats down to one of the things taught in kindergarten: be nice.

“Google’s data-driven approach ended up highlighting what leaders in the business world have known for a while; the best teams respect one another’s emotions and are mindful that all members should contribute to the conversation equally. It has less to do with who is in a team, and more with how a team’s members interact with one another.”

Team members who understand, respect, and allow each other to contribute to conversation equally.  It is a basic human tenant and even one of the better ways to manage a relationship, according to marriage therapists around the world.  Another result of the project is dubbed “psychological safety,” where team members create an environment with the established belief they can take risks and share ideas without ridicule.

Will psychological safety be a new buzzword since Google has “discovered” that being nice works so well?  The term has been around for a while, at least since 1999.

Google’s research yields a business practice that other companies have adopted: Costco, Trader Joes, Pixar, Sassie, and others to name a few.  Yet why is it so hard to be nice?

 

Whitney Grace, March 18, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Search: Gone and Replaced. A Research Delight

March 17, 2016

The notion of indexing “all the world’s information” is an interesting one. I am amused by the assumption some folks make that Bing, Google, and Yandex index “every” Web site and “all” content.

I read “China Has Unblocked Internet Searches That Refer to Kim Jong Un As a ‘Pig’.” The article is a reminder that finding information can be a very difficult business.

According to the write up from an outfit rumored to be interested in some of the Yahooligans’ online business, I learned:

China appears to have made an exception within its extremely restricted Internet this week, for an unusual search term — a reference to North Korean dictator Kim Jong Un as a “third-generation pig.”

What other items are back online? Heck, what books are available in digital form in any country? I do find the animal reference interesting, however. I am baffled by the concept of third-generation.

When you run a query, do you get access to “all” information, or is the entire digital information access environment subject to filtering. Maybe third generation filtering?

Stephen E Arnold, March 17, 2016

A Dead Startup Tally Sheet

March 17, 2016

Startups are the buzzword for companies that are starting up in the tech industry, usually with an innovative idea that garners them several million in investments.  Some startups are successful, others plodder along, and many simply fail.  CBS Insights makes an interesting (and valid) comparison with tech startups and dot-com bust that fizzled out quicker than a faulty firecracker.

While most starts appear to be run by competent teams that, sometimes they fizzle out or are acquired by a larger company.  Many of them are will not make it as a headlining company.  As a result, CBS Insights invented, “The Downround Tracker: Which Companies Are Not Living Up To The Expectations?”

CBS Insights named this tech boom, the “unicorn era,” probably from the rare and mythical sightings of some of these companies.  The Downround Tracker tracks unicorn era startups that have folded or were purchased.  Since 2015, fifty-six total companies have made the Downround Tracker list, including LiveScribe, Fab.com, Yodle, Escrow.com, eMusic, Adesto Technologies, and others.

Browse through the list and some of the names will be familiar and others will make you wonder what some of these companies did in the first place.  Companies come and go in a fashion that appears to be quicker than any other generation.  At least in shows that human ingenuity is still working, cue Kanas’s “Dust in the Wind.”

 

Whitney Grace, March 17, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

HP Enterprise: Is Haven Autonomy IDOL after a Project Runway Touch Up?

March 16, 2016

Short honk: I read “HPE Launches Machine-Learning-As-a-Service on Microsoft Azure.” The hook for me was the pricing for a new cloud search and content processing service. I did not understand the approach; for example, what the heck is an “API unit”?

image

But what caused me to jot down this note was this list of HPE Haven OnDemand functions. Here’s the list I circled:

  • Advanced Text Analysis, which pulls concepts and sentiment from text.
  • Format conversion, which converts data wherever it lives.
  • Search tools across on-premises or cloud data.
  • Image recognition and face detection.
  • Knowledge graph analysis.
  • Pattern and speech recognition.

Based on my sketchy knowledge about Autonomy IDOL, this list seems to be a summary of Autonomy’s integrated data operating features. Most of these were added to the IDOL platform in the years before HP paid $11 billion for the 1998 system which, to be fair, had been upgraded in the intervening years.

The list also reminded me of some of the functions I associated with “augmented intelligence,” a niche currently occupied by outfits like Palantir and IBM i2.

In terms of pricing, the Palantir Hobbits charge for a license, training, support, and some other goodies. But the pricing is not variable. The IBM i2 folks deliver a collection of options and each option has a price tag.

HPE’s pricing is a bit of a mystery. How many API units fit on the head of Big Data project? Whittling down that $11 billion investment suggests that the API units may be more expensive than the monthly fees suggest; for example, the introductory offer offers 50,000 API units and 15 Resource Units for [the] first three months for all paid plans.” What’s a “Resource Unit”?

The write up raises more questions than it answers in my opinion. I wonder how Autonomy IDOL will look in fall fashions?

Stephen E Arnold, March 16, 2016

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta