CyberOSINT banner

VK.com: An Alternative to Facebook

May 27, 2016

VK is the name of the “old” VKontakte social networking service. My estimates peg the traffic to the site at about 30 percent of Facebook. The user count is in the 300 million range and growing. The user base is concentrated in Russia, but the service is attracting users from other countries. Online translation tools make it easy for a non Russian speaker to use the service.

Earlier this year, I read “German Neo Nazis Flocking to Putin’s Facebook Knock off VKontakte.” The write up seems a bit one sided, but social network sites allegedly linked to Mr. Putin suggests that a bit of additional research and investigation are warranted.

You can sign up and explore VK.com at www.vk.com. I provide some basics for appropriate prophylactic measures in my Dark Web lectures. One thought is okay here: Be prudent.

You will need a VK.com account to access an interesting facial recognition service called FindFace. The site looks like this:

image

Like Google’s “search by photo”, the FindFace service delivers close matches to the face you upload to the service. The Guardian published “Face Recognition App Taking Russia by Storm May Bring End to Public Anonymity.” The digital wannabe stated:

FindFace compares photos to profile pictures on social network Vkontakte and works out identities with 70% reliability.

I mention VK.com and FindFace because I was asked if there were an alternative to Facebook. The answer is, “VC.com.” However, the use of the service for certain types of groups and certain purposes is less easy than it was in the past. Some folks can use the VK.com apps and features instead of fooling around with Dark Web services.

Stephen E Arnold, May 27, 2016

The Reinterpretation of Google History

May 27, 2016

I read “Why Google Beat yahoo in the War for the Internet.” The information in the article touches upon some important points; for example, Google focused on a more homogeneous infrastructure. The history of Google, however, includes some tactical moves which the article ignores.

My enthusiasm for recycling the information about Google’s first five years has shriveled since I published the third volume of my Google trilogy. I want to point out several factoids which, no doubt, will interest few today. The article makes much of what is described as a “fresh start.” I do not agree with the “fresh” part.

First, Google is a descendent of Alta Vista, Jon Kleinberg’s Clever, and the information access research conducted at Stanford University and other universities with an interest in this technical field. As a result, the infrastructure benefits from Digital Equipment’s investment in its Alta Vista system. Much of that “knowledge” migrated to Google as Messrs. Brin and Page hired notable professionals away from the chaos of Alta Vista under Hewlett Packard’s management. Jeff Dean, Simon Tong, and others are responsible for much of the infrastructure for Google. The Alta Vista system was anchored in the DEC technology. The memory management benefits were obtained at a cost. Google embraced commodity hardware and a big chunk of the Alta Vista thinking. Fresh? Well, sort of.

Second, Google’s scrutiny of Yahoo had a couple of payoffs. Yahoo was a crazy quilt of warring tribes. Each tribe had its own technology idols. Google interpreted this as expensive and focused on reducing the costs by standardizing on systems and methods to a greater degree than Yahoo did. Over time, Yahoo became more sluggish due to its different fiefdoms. Google was comparatively stronger due to its less chaotic approaches. Don’t get me wrong. Google in its first five years was a wild and crazy outfit. Yahoo was wilder and crazier. As part of Google’s learning from Yahoo, Google recognized the value of selling ads the Yahoo way. Yahoo was unhappy with Google’s borrowing of its GoTo.com/Overture approach. Google settled a legal spat with about $1 billion in payments to the Yahooligans. But the majority of Google’s revenue comes from that GoTo.com/Overture me too play.

Third, Google, like Yahoo, is not sure what it will be from year to year. The difference is that Google has crafted a relatively consistent flow of advertising revenue from its early and somewhat crude pre-Oingo days. Google integrated acquired technologies more effectively than Yahoo typically did. The ability to integrate provided Google an important edge.

There are other touchpoints in Google’s early days. From my point of view, Google is from its inception a beneficiary of good luck because the competitors in Web search were distracted in an effort to become portals. Google, as I see the company, is less of an innovator and more of an emulator. Google has yet to demonstrate that renaming the company, reorganizing the units, and funding projects like cheating death will yield the next big thing.

Google, for me, was a one off, an anomaly.

Stephen E Arnold, May 27, 2016

Inc Magazine Explains Search. Really.

May 26, 2016

I read “How the World of Search Looks Like. Really.” [sic]

Now that is fine syntax. Perhaps the savvy Inc editor is confused about the Strunk & White comments about the use of “what”? Really.

The write up is even more orthogonal than the headline’s word choice.

An expert in search, who works at Gravity Media, has focused his attention on information access. Now information access is a nebulous concept. Search is a bit less difficult to define if you are, like me, pushing 72 years in age. Log on to an online system. Enter a keyword. Review the matches the system generates via brute force look up. See, easy, really.

I learned in the write up that my Abe Lincoln learnings are hopelessly out of whack.

I noted this passage:

While young Snapchatters who grew up in the midst of the evolving Web may prefer to Google search, the later-adopting Baby Boomers may very well be using Yahoo search.

Okay. Snapchatters. How does one “find” information via Snapchat?

I noted this statement:

Globally, quite a few other competitors are making good old Google sweat a bit. Tell me, are you feeling lucky? (My poor attempt at a Google joke…) Internationally, people are Yandexing, Baiduing, Yahooing, and the list goes on and on.

Ha. Ha. Really.

Then a statement which blindsided me. People in different countries search for information in ways different from those used in the US:

As the globe continues to shrink in the wake of the World Wide Web, these cultural nuances are something international brands should consider when trying to capture global audiences. Up until now there has been little attention paid to this increasing trend of the “other” search networks.

Right. Little attention. I assume those ads on Baidu for products not from China are outliers?

I circled:

Chinese-Americans’ searches will likely use a combination of Chinese and English search terms depending on what their level of comfort is with translation. This same fact is the reason a first generation Chinese millennial living in the U.S. would choose to utilize both Baidu and Google, depending on what they are searching.

My approach is to search for Chinese information in Chinese. I don’t read or speak Chinese, but I have team members who do. If one of these people sends me a link to a document in a language other than English, I use various online translation systems to get the main idea. Then I pick up the phone and talk with the native speaker about the information.

I completed the article with a big blue exclamation point:

In conclusion, the truth is that there is very little data on the Internet related to global search trends and user preferences. If the Internet has taught us one thing it is that being more visible on the Web is always to the benefit of the marketer. So if your brand has not been running search campaigns across more networks than just Google, now is the time to start. The insights that can be derived from a test campaign alone can reveal hugely important details related to the search habits of your target audience. Even if the outcome of a Yahoo paid search campaign reaffirms that a strictly Google campaign is the way to reach your brand’s target audience, there is only one way to find out – test it.

It seems, gentle reader, that the article is less about the search thing and more about the marketing services thing. That’s okay. Little wonder that niche search engines are poking their noses into the big, uncertain world. One can now search for gifs at GifMe or Giphy for this reason.

Back to Inc. What the heck is the editorial policy at Inc. Wonky word choice and an article about search which does not address the topic of what the world of search looks like. Looks like content marketing at best and editorial shortcutting from my vantage point in rural Kentucky. Really.

Stephen E Arnold, May 26, 2016

The Internet Archive: How It Works

May 26, 2016

I have noted that the interface for the Internet Archive is interesting. For me, the system is almost unusable.

I read “The Technology Behind the World’s Worst DVR.” I think the write up does a good job of explaining some of the challenges the system presents to the developers and to the users. I learned that the political ad archive works like this:

image

Okay, WordPress. I am a bit fuzzy about the other icons, however.

The good news appears in this statement:

Over the coming months we are working to make the system more accurate, and exploring ways to get it so that it can automagically identify newly released political ads without any need for manual entry.

Worth monitoring.

Stephen E Arnold, May 26, 2016

JavaScript Code Search

May 25, 2016

The general purpose Web search systems are not particularly useful for narrow queries. As a result, developers who want to locate JavaScript code to perform a specific task have had to bang away at Bing, forums, Google, and odd duck discussions on open source code sites. I learned in “Find JavaScript Code Snippets by Functionality with Cocycles” that there is a niche search engine available. Navigate to Cocycles and run your query. According to the service’s Web site, additional languages will be added to the system in the near future. Worth a look.

Stephen E Arnold, May 25, 2016

Google Quote to Note: Search and Smart Software

May 24, 2016

I saw a Quora post by Peter Norvig, one of Alphabet Google’s wizards. [You m ay have to log in to view the statement. Also, the Quora search result for you may require some fiddling. Hey, life in the fast search lane is exciting.]

Novig’s subject is search. I highlighted these statements from the most viewed writer in artificial intelligence:

“Modern” Google, as Sundar has set out the vision, is based not just on suggestions of relevant information, but on informing and assisting.

In short, Google will figure out what “you” really want. But what if I want to locate specific words and phrases? Well, too bad for me.

How about that precision and recall stuff? It seems Google has admitted the 80 percent ceiling for precision and recall. I circled:

With information retrieval, anything over 80% recall and precision is pretty good—not every suggestion has to be perfect, since the user can ignore the bad suggestions. With assistance, there is a much higher barrier.

Really? A glass ceiling which has been evident in the TREC results for what? A decade?

Does this suggest that when Google cannot solve a problem, it punts? What about that solving death thing?

Stephen E Arnold, May 24, 2016

Australian Software Developer Revealed the Panama Papers

May 23, 2016

The Panama Papers have released an entire slew of scandals that sent out ripples we will be dealing with for years to come.  It also strikes another notch in the power of software and that nothing is private anymore.  But how were the Panama Papers leaked?  Reuters reports that a “Small Australian Software Firm Helps Join The Dots On The Panama Papers”.

Nuix Pty Ltd. is a Sydney-based software development company that donated its document analysis program to the International Consortium of Investigative Journalists (ICIJ) to delve through the data from Mossack Fonseca, the Panamanian law firm that leaked the documents.  Reporters have searched through the data for some time and discovered within the 2.6 terabytes the names of politicians and public figures with questionable offshore financial accounts.

“By using the software, the Washington-based ICIJ was able to make millions of scanned documents, some decades old, text-searchable and help its network of journalists cross reference Mossack Fonseca’s clients across these documents.  The massive leak has prompted global investigations into suspected illegal activities by the world’s wealthy and powerful. Mossack Fonseca, the firm at the center of the leaks, denies any wrongdoing.  The use of advanced document and data analysis technology shows the growing importance of technology’s role in helping journalists make better sense of increasingly bigger news discoveries.”

Nuix Pty is a ten-year-old company and their products have been used to conduct data analysis in child pornography rings, people trafficking, and high-end tax evasion.  Another selling feature for the company is their dedication to their clients’ privacy.  They did not allow themselves to have access to the information within the Panama Papers.  That is an interesting fact, considering how some tech companies need to have total access to their clients’ information.

Nuix sounds like the Swiss bank of software companies, guaranteeing high-quality services and products that guarantee results, plus undeniable privacy.

 

Whitney Grace, May 23, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Search Sink Hole Identified and Allegedly Paved and Converted to a Data Convenience Store

May 20, 2016

I try to avoid reading more than one write up a day about alleged revolutions in content processing and information analytics. My addled goose brain cannot cope with the endlessly recycled algorithms dressed up in Project Runway finery.

I read “Ryft: Bringing High Performance Analytics to Every Enterprise,” and I was pleased to see a couple of statements which resonated with my dim view of information access systems. There is an accompanying video in the write up. I, as you may know, gentle reader, am not into video. I prefer reading, which is the old fashioned way to suck up useful factoids.

Here’s the first passage I highlighted:

Any search tool can match an exact query to structured data—but only after all of the data is indexed. What happens when there are variations? What if the data is unstructured and there’s no time for indexing? [Emphasis added]

The answer to the question is increasing costs for sales and marketing. The early warning for amped up baloney are the presentations given at conferences and pumped out via public relations firms. (No, Buffy, no, Trent, I am not interested in speaking with the visionary CEO who hired you.)

I also highlighted:

With the power to complete fuzzy search 600X faster at scale, Ryft has opened up tremendous new possibilities for data-driven advances in every industry.”

I circled the 600X. Gentle reader, I struggle to comprehend a 600X increase in content processing. Dear Mother Google has invested to create a new chip to get around the limitations of our friend Von Neumann’s approach to executing instructions. I am not sure Mother Google has this nailed because Mother Google, like IBM, announces innovations without too much real world demonstration of the nifty “new” things.

I noted this statement too:

For the first time, you can conduct the most accurate fuzzy search and matching at the same speed as exact search without spending days or weeks indexing data.

Okay, this strikes me as a capability I would embrace if I could get over or around my skepticism. I was able to take a look at the “solution” which delivers the astounding performance and information access capability. Here’s an image from Ryft’s engineering professionals:

image

Notice that we have Spark and pre built components. I assume there are myriad other innovations at work.

The hitch in the git along is that in order to deal with certain real world information processing challenges, the inputs come from disparate systems, each generating substantial data flows in real time.

Here’s an example of a real world information access and understanding challenge, which, as far as I know, has not been solved in a cost effective, reliable, or usable manner.

image

Image source: Plugfest 2016 Unclassified.

This unclassified illustration makes clear that the little things in the sky pump out lots of data into operational theaters. Each stream of data must be normalized and then converted to actionable intelligence.

The assertion about 600X sounds tempting, but my hunch is that the latency in normalizing, transferring, and processing will not meet the need for real time, actionable, accurate outputs when someone is shooting at a person with a hardened laptop in a threat environment.

In short, perhaps the spark will ignite a fire of performance. But I have my doubts. Hey, that’s why I spend my time in rural Kentucky where reasonable people shoot squirrels with high power surplus military equipment.

Stephen E Arnold, May 20, 2016

The Kardashians Rank Higher Than Yahoo

May 20, 2016

I avoid the Kardashians and other fame chasers, because I have better things to do with my time.  I never figured that I would actually write about the Kardashians, but the phrase “never say never” comes into play.  As I read Vanity Fair’s “Marissa Mayer Vs. ‘Kim Kardashian’s Ass” : What Sunk Yahoo’s Media Ambitions?” tells a bleak story about the current happenings at Yahoo.

Yahoo has ended many of its services, let go fifteen percent of staff, and there are very few journalists left on the team.  The remaining journalists are not worried about producing golden content, they have to compete with a lot already on the Web, especially “Kim Kardashian’s ass” as they say.

When Marissa Mayer took over Yahoo as the CEO in 2012, she was determined to carve out Yahoo’s identity as a tech company.  Mayer, however, wanted Yahoo to be media powerhouse, so she hired many well-known journalists to run specific niche projects in popular areas from finance to beauty to politics.  It was not a successful move and now Yahoo is tightening its belt one more time.  The Yahoo news algorithm did not mesh with the big name journalists, the hope was that their names would soar above popular content such as Kim Kardashian’s ass.  They did not.

Much of Yahoo’s current work comes from the Alibaba market.  The result is:

“But the irony is that Mayer, a self-professed geek from Silicon Valley, threw so much of her reputation behind high-profile media figures and went with her gut, just like a 1980s magazine editor—when even magazine editors, including those who don’t profess to “get” technology, have long abandoned that practice themselves, in favor of what the geeks in Silicon Valley are doing.”

Mayer was trying to create a premiere media company, but lower quality content is more popular than top of the line journalists.  The masses prefer junk food in their news.

 

Whitney Grace, May 20, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Signs of Life from Funnelback

May 19, 2016

Funnelback has been silent as of late, according to our research, but the search company has emerged from the tomb with eyes wide open and a heartbeat.  The Funnelback blog has shared some new updates with us.  The first bit of news is if you are “Searchless In Seattle? (AKA We’ve Just Opened A New Office!)” explains that Funnelback opened a new office in Seattle, Washington.   The search company already has offices in Poland, United Kingdom, and New Zealand, but now they want to establish a branch in the United States.  Given their successful track record with the finance, higher education, and government sectors in the other countries they stand a chance to offer more competition in the US.  Seattle also has a reputable technology center and Funnelback will not have to deal with the Silicon Valley group.

The second piece of Funnelback news deals with “Driving Channel Shift With Site Search.”  Channel shift is the process of creating the most efficient and cost effective way to deliver information access and usage to users.  It can be difficult to implement a channel shift, but increasing the effectiveness of a Web site’s search can have a huge impact.

Being able to quickly and effectively locate information on a Web site saves time for not only more important facts, but it also can drive sales, further reputation, etc.

“You can go further still, using your search solution to provide targeted experiences; outputting results on maps, searching by postcode, allowing for short-listing and comparison baskets and even dynamically serving content related to what you know of a visitor, up-weighting content that is most relevant to them based on their browsing history or registered account.

Couple any of the features above with some intelligent search analytics, that highlight the content your users are finding and importantly what they aren’t finding (allowing you to make the relevant connections through promoted results, metadata tweaking or synonyms), and your online experience is starting to become a lot more appealing to users than that queue on hold at your call centre.”

I have written about it many times, but a decent Web site search function can make or break a site.  Not only does it demonstrate that the Web site is not professional, it does not inspire confidence in a business.  It is a very big rookie mistake to make.

 

Whitney Grace, May 19, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Next Page »