CyberOSINT Update: No Fooling

April 1, 2015

Two quick items about cyber OSINT. These are not April Fool jokes and the information is available without nag screens, registration forms, or blinking ads.

First, we have posted a five minute video that explains what cyber OSINT means. I was interviewed by award winning tech journalist Ric Manning. You can view the video at this link.

Second, we have started a new interview series. Like the original Search Wizards Speak series of interviews, the Cyber Wizards Speak interviews provide more first-person information about cyber OSINT from those working in the field. The interviews are intended for those interested in law enforcement, intelligence, and security. The first interview in the series presents the viewpoints of Luca Scagliarini, one of the original developers of the Expert System Cogito system. You can find the interview at www.xenky.com/expert-system.

Watch for upcoming announcements about more cyber OSINT videos and interviews with the principals of BrightPlanet and Recorded Future.

Copies of my new study CyberOSINT: Next Generation Access are available at www.xenky.com/cyberosint.

Stephen E Arnold, April 1, 2015

Digital Shadows Searches the Shadow Internet

March 23, 2015

The deep Web is not hidden from Internet users, but regular search engines like Google and Bing do not index it in their results.  Security Affairs reported on a new endeavor to search the deep Web in the article, “Digital Shadows Firm Develops A Search Engine For The Deep Web.”  Memex and Flashpoint are two search engine projects that are already able to scan the deep Web.  Digital Shadows, a British cyber security firm, is working on another search engine specially designed to search the Tor network.

The CEO of Digital Shadows Alistair Paterson describes the project as Google for Tor.  It was made for:

“Digital Shadows developed the deep Web search engine to offer its services to private firms to help them identifying cyber threats or any other illegal activity that could represent a threat.”

While private firms will need and want this software to detect illegal activities, law enforcement officials currently need deep Web search tools more than other fields.  They use it to track fraud, drug and sex trafficking, robberies, and tacking contraband.  Digital Shadows is creating a product that is part of a growing industry.  The company will not only make profit, but also help people at the same time.

Whitney Grace, March 23, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Algorithms: Be Careful with Those College Math Notes

March 13, 2015

I read “Algorithmia Launches With More Than 800 Algorithms On Its Marketplace.” With the world embracing smart software, the monetization of math is no surprise. I would point out that one of my math books is an early version of Numerical Recipes: The Art of Scientific Computing. the book contains more than 400 numerical routines. The book includes useful explanations of MCMC, linear programming, and Delaunay triangulation, and more.

I also have Advanced Math for Beginners, a Russian textbook. There are other math books on my shelves including a copy of Zbigniew Michalewicz’s Genetic Algorithms + Data Structures = Evolution Programs. My assumption is that I could study the examples in these and other books, create a program, and move forward with my really smart software application. Maybe not? I thought. What happens if a I use an algorithm for sale on Algorithmia which I ingested from one of these textbooks? Yikes. Jail time?  A fine? A Google Oracle Java style dust up? Could Algorithmia take legal action against a company dependent on methods taught in college classes?

Stephen E Arnold, March 13, 2015

Algorithm Complexity Simplified

February 23, 2015

I know the experts in search and content processing have nailed sleek, efficient algorithms. For the very few who have no idea what algorithm complexity embraces, may I suggest a romp through “A Gentle Introduction to Algorithm Complexity Analysis.” If those algorithms are not fir like Euell Gibbons, the hyped benefits of a particular system may not be available. In the world of content processing, I am not sure a connection between the flowery assertions of marketers and the code itself are necessarily connected. The document appears to be available in Greek, Russian, and Spanish as well as English. Worth a glance in my opinion.

Stephen E Arnold, February 23, 2015

Math Equation Similarity Search

February 19, 2015

Have you asked, “Is this equation similar to another equation?” If yes, you will want to bookmark SearchOnMath Enter your equation via point and click and hit search. Bingo. Quite useful.

Stephen E Arnold, February 19, 2015

Enterprise Search Lacks NGIA Functions

January 29, 2015

Users Want More Than Hunting through a Rubbish

CyberOSINT: Next Generation Information Access is, according to Ric Manning, the publisher of Stephen E Arnold’s new study, is now available. You can order a copy at the Gumroad online store or via the link on Xenky.com.

cover for ads

One of the key chapters in the 176 page study of information retrieval solution that move beyond search takes you under the hood of an NGIA system. Without reproducing the 10 page chapter and its illustrations, I want to highlight two important aspects of NGIA systems.

When a person requires information under time pressure, traditional systems pose a problem. The time required to figure out which repository to query, craft a query or take a stab at what “facet” (category) may contain the information, scanning the outputs the system displays, opening a document that appears to be related to the query, and then figuring out exactly what item of data is the one required makes traditional search a non starter in many work situations. The bottleneck is the human’s ability to keep track of which digital repository contains what. Many organizations have idiosyncratic terminology, and users in one department may not be familiar with the terminology used in another unit of the organization.

image

Register for the seminar on the Telestrategies’ Web site.

Traditional enterprise search systems trip and skin their knees over the time issue and over the “locate what’s needed issue.” These are problems that have persisted in search box oriented systems since the days of RECON, SDC Orbit, and Dialcom. There is little a manager can do to create more time. Time is a very valuable commodity and it often determines what type of decision is made and how risk laden that decision may be.

There is also little one can do to change how a bright human works with a system that forces a busy individual to perform iterative steps that often amount to guessing the word or phrase to unlock what’s hidden in an index or indexes.

Little wonder that convincing a customer to license a traditional keyword system continue to bedevil vendors.

A second problem is the nature of access. There is news floating around that Facebook has been able to generate more ad growth than Google because Facebook has more mobile users. Whether Facebook or Google dominates social mobile, the key development is “mobile.” Works need information access from devices which have smaller and different form factors from the multi core, 3.5 gigahertz, three screen workstation I am using to write this blog post.

Read more

Dataiku: Former Exalead Wizard Strikes Big Data Fire

January 24, 2015

I read “Big Data : Le Français Dataiku Lève 3 millions d’Euros.” The recipient of the cash infusion is Dataiku. Founded by former Exalead wizard Florian Douetteau, Dataiku offers:

a software platform that aggregates all the steps and big data tools necessary to get from raw data to production ready applications. It shortens the load-prepare-test-deploy cycles required to create data driven applications.

The company’s approach is to reduce the complexity of Big Data app construction. The company’s algorithms support predictive analytics. A community edition download is available at http://www.dataiku.com/dss/editions/.

Dataiku plans to open an office in the US in 2015.

Information about Dataiku is at http://www.dataiku.com.

Stephen E Arnold, January 24, 2015

Artificial Intelligence Text for Free

September 24, 2014

Short honk: Artificial intelligence is in the news. If you want to brush up on your expertise, you can download Artificial Intelligence: Foundations of Computational Agents by David Poole and Alan Mackworth. Although published in 2010, the book is quite useful. Get your copy at this link http://bit.ly/1sVhWaq.

Stephen E Arnold, September 24, 2014

Who Wrote What? Will an Algorithm Catch Name Surfers?

August 17, 2014

I read “New Algorithm Gives Credit Where Credit Is Due.” The write up sparked a number of thoughts. Let me highlight a couple of passages that made it into my research file.

The focus of the paper, in my opinion, are documents intended for peer reviewed publications and conferences. The write up did not include a sample of the type of “authorship” labeling that takes place. I dug through my files and located a representative example:

image

This is a paper about stuffing electronics on a contact lens. Microsoft was in this game. Google hired Babak Parviz (aka Babak Amir Parviz, Babak Amirparviz, and Babak Parvis). The paper has four authors:

  • H. Yao
  • A. Afanasiev
  • I. Lahdesmaki
  • B. A. Parviz

The idea is that the numerical recipe devised at the Center for Complex Network Research will figure out who did most of the work. I think this is a good idea because my research suggests that the guys doing the heavy lifting in the lab, with Excel, and writing were Yao, Afanasiev, and Lahdesmaki. The guru for the work was Parviz. I could be wrong, so an algorithm to help me out is of interest.

One of the points I highlighted in the write up was:

Using the algo­rithm, which Shen [math whiz] devel­oped, the team revealed a new credit allo­ca­tion system based on how often the paper is co-??cited with the other papers pub­lished by the paper’s co-??authors, cap­turing the authors’ addi­tional con­tri­bu­tions to the field.

Okay, my take on this is that this is a variation of Eugene Garfield’s citation analysis work. That is useful, but it does not dig very deeply into the context for the paper, the patent applications afoot, or the controls placed on the writers by their employers or their conscience. In short, I need some concrete examples or better yet access to the software so I can run some tests. Yep, just like those that mid tier consulting firms (what I call azure chip consultants) do not do. For reference see the Netscout legal document or my saucisson write up.)

The second point is that the sample strikes me as small. I know the rule of thumb that one well regarded researcher used was 50 in the sample, but there are hundreds of thousands of technical papers. Many are available as open source from services like PLOS One. Here’s the point I noted:

the team looked at 63 prize-??winning papers using the algo­rithm. In another finding, the algo­rithm showed physi­cist Tom Kibble, who in 1964 wrote a research paper on the Higgs boson theory, should receive the same amount of credit as Nobel prize win­ners Peter Higgs and François Englert.

I think the work is interesting, but it is in my opinion not ready for prime time.

I know that one content processing firm almost totally dependent on the US Army for funding has been working to identify misinformation, disinformation, and reformation. So far, the effort has yielded no commercial product. Other companies purport to have the ability to “understand” content. Presumably this includes the entities identified in the content object. Progress has stalled. Smart software is easier to write about in a marketing slide deck or a proposal than actually deliver.

That’s why authorship remains something a human has to chase down. Let me give you an example. I provided research to IDC, a mid tier consulting firm in 2012. From august 2012 to July 17, 2014, IDC marketed reports that carried my name, two of my research assistants’ names, and an IDC “expert’s” name. Dave Schubmehl, the IDC “expert” in search is listed as the “author.”

Now is he?

I am confident that in his mind and in IDC’s corporate wisdom he is the man. The person who justifies surfing on another’s name illustrates a core problem in authorship. You can see examples of Dave Schubmehl’s name surfing at this link. The sale of one of these documents on Amazon was an interesting attempt to gain traction for Dave Schubmehl in the high traffic eBook store. See “Amazon May Be Disintermediating Publishers: Maybe Good News for Authors.” I include a screen shot of the Amazon “hit.” My legal eagle successfully got the document removed from Amazon. I am not an Amazon author and don’t want to be.

Hopefully the algorithm to identify the “real” author of a series of $3,500 reports will become a commercial reality. I am interested to learn if there are any other mid tier consulting firms that have used others’ content without getting appropriate permissions. How many “experts” follow the IDC path of expediency?

For now, name surfers have to tracked one by one. Shubmehl and Arnold are now linked. Arnold is the surfboard; Schubmehl is the surfer. Catch a wave is the motto of many surfers.

Stephen E Arnold, August 17, 2014

When a Search Vendor Says Fuzzy

August 9, 2014

Short honk: You may have heard a search or content processing vendor use the word “fuzzy” or “fuzzify” to describe a smart system. If you have, you may want to know what may be behind the jargon curtain. For fuzzy insights (intentional word choice, gentle reader) check out “Binary Fuzzing Strategies: What Works, What Doesn’t.” If you are not sure what “works” means, feel free to contact the saucisson at the IDC-type consulting firms. Illumination is only a payment away. If you choose another route, get your math T shirt on and check out https://code.google.com/p/american-fuzzy-lop/wiki/StatusScreen.

Stephen E Arnold, August 9, 2014

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta