Useful Probability Lesson in Monte Carlo Simulations

April 6, 2015

It is no surprise that probability blogger Count Bayesie, also known as data scientist Will Kurt, likes to play with random data samples like those generated in Monte Carlo simulations. He lets us in on the fun in this useful summary, “6 Neat Tricks with Monte Carlo Simulations.” He begins:

“If there is one trick you should know about probability, it’s how to write a Monte Carlo simulation. If you can program, even just a little, you can write a Monte Carlo simulation. Most of my work is in either R or Python, these examples will all be in R since out-of-the-box R has more tools to run simulations. The basics of a Monte Carlo simulation are simply to model your problem, and then randomly simulate it until you get an answer. The best way to explain is to just run through a bunch of examples, so let’s go!”

And run through his six examples he does, starting with the ever-popular basic integration. Other tricks include approximating binomial distribution, approximating Pi, finding p-values, creating games of chance, and, of course, predicting the stock market. The examples include code snippets and graphs. Kurt encourages readers to go further:

“By now it should be clear that a few lines of R can create extremely good estimates to a whole host of problems in probability and statistics. There comes a point in problems involving probability where we are often left no other choice than to use a Monte Carlo simulation. This is just the beginning of the incredible things that can be done with some extraordinarily simple tools. It also turns out that Monte Carlo simulations are at the heart of many forms of Bayesian inference.”

See the write-up for the juicy details of the six examples. This fun and informative lesson is worth checking out.

Cynthia Murrell, April 6, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Written by Stephen E. Arnold · Filed Under algorithms, Analytics, Data, News | 1 Comment

CyberOSINT Update: No Fooling

April 1, 2015

Two quick items about cyber OSINT. These are not April Fool jokes and the information is available without nag screens, registration forms, or blinking ads.

First, we have posted a five minute video that explains what cyber OSINT means. I was interviewed by award winning tech journalist Ric Manning. You can view the video at this link.

Second, we have started a new interview series. Like the original Search Wizards Speak series of interviews, the Cyber Wizards Speak interviews provide more first-person information about cyber OSINT from those working in the field. The interviews are intended for those interested in law enforcement, intelligence, and security. The first interview in the series presents the viewpoints of Luca Scagliarini, one of the original developers of the Expert System Cogito system. You can find the interview at www.xenky.com/expert-system.

Watch for upcoming announcements about more cyber OSINT videos and interviews with the principals of BrightPlanet and Recorded Future.

Copies of my new study CyberOSINT: Next Generation Access are available at www.xenky.com/cyberosint.

Stephen E Arnold, April 1, 2015

Written by Stephen E. Arnold · Filed Under AI, algorithms, Cyber OSINT, News, Technology | Comments Off on CyberOSINT Update: No Fooling

Digital Shadows Searches the Shadow Internet

March 23, 2015

The deep Web is not hidden from Internet users, but regular search engines like Google and Bing do not index it in their results. Security Affairs reported on a new endeavor to search the deep Web in the article, “Digital Shadows Firm Develops A Search Engine For The Deep Web.” Memex and Flashpoint are two search engine projects that are already able to scan the deep Web. Digital Shadows, a British cyber security firm, is working on another search engine specially designed to search the Tor network.

The CEO of Digital Shadows Alistair Paterson describes the project as Google for Tor. It was made for:

“Digital Shadows developed the deep Web search engine to offer its services to private firms to help them identifying cyber threats or any other illegal activity that could represent a threat.”

While private firms will need and want this software to detect illegal activities, law enforcement officials currently need deep Web search tools more than other fields. They use it to track fraud, drug and sex trafficking, robberies, and tacking contraband. Digital Shadows is creating a product that is part of a growing industry. The company will not only make profit, but also help people at the same time.

Whitney Grace, March 23, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Written by Stephen E. Arnold · Filed Under algorithms, Big data, Data, News, Security | Comments Off on Digital Shadows Searches the Shadow Internet

Algorithms: Be Careful with Those College Math Notes

March 13, 2015

I read “Algorithmia Launches With More Than 800 Algorithms On Its Marketplace.” With the world embracing smart software, the monetization of math is no surprise. I would point out that one of my math books is an early version of Numerical Recipes: The Art of Scientific Computing. the book contains more than 400 numerical routines. The book includes useful explanations of MCMC, linear programming, and Delaunay triangulation, and more.

I also have Advanced Math for Beginners, a Russian textbook. There are other math books on my shelves including a copy of Zbigniew Michalewicz’s Genetic Algorithms + Data Structures = Evolution Programs. My assumption is that I could study the examples in these and other books, create a program, and move forward with my really smart software application. Maybe not? I thought. What happens if a I use an algorithm for sale on Algorithmia which I ingested from one of these textbooks? Yikes. Jail time? A fine? A Google Oracle Java style dust up? Could Algorithmia take legal action against a company dependent on methods taught in college classes?

Stephen E Arnold, March 13, 2015

Written by Stephen E. Arnold · Filed Under algorithms, Legal matters, News | 1 Comment

Algorithm Complexity Simplified

February 23, 2015

I know the experts in search and content processing have nailed sleek, efficient algorithms. For the very few who have no idea what algorithm complexity embraces, may I suggest a romp through “A Gentle Introduction to Algorithm Complexity Analysis.” If those algorithms are not fir like Euell Gibbons, the hyped benefits of a particular system may not be available. In the world of content processing, I am not sure a connection between the flowery assertions of marketers and the code itself are necessarily connected. The document appears to be available in Greek, Russian, and Spanish as well as English. Worth a glance in my opinion.

Stephen E Arnold, February 23, 2015

Written by Stephen E. Arnold · Filed Under algorithms, News | Comments Off on Algorithm Complexity Simplified

Math Equation Similarity Search

February 19, 2015

Have you asked, “Is this equation similar to another equation?” If yes, you will want to bookmark SearchOnMath Enter your equation via point and click and hit search. Bingo. Quite useful.

Stephen E Arnold, February 19, 2015

Written by Stephen E. Arnold · Filed Under algorithms, News | Comments Off on Math Equation Similarity Search

Enterprise Search Lacks NGIA Functions

January 29, 2015

Users Want More Than Hunting through a Rubbish

CyberOSINT: Next Generation Information Access is, according to Ric Manning, the publisher of Stephen E Arnold’s new study, is now available. You can order a copy at the Gumroad online store or via the link on Xenky.com.

One of the key chapters in the 176 page study of information retrieval solution that move beyond search takes you under the hood of an NGIA system. Without reproducing the 10 page chapter and its illustrations, I want to highlight two important aspects of NGIA systems.

When a person requires information under time pressure, traditional systems pose a problem. The time required to figure out which repository to query, craft a query or take a stab at what “facet” (category) may contain the information, scanning the outputs the system displays, opening a document that appears to be related to the query, and then figuring out exactly what item of data is the one required makes traditional search a non starter in many work situations. The bottleneck is the human’s ability to keep track of which digital repository contains what. Many organizations have idiosyncratic terminology, and users in one department may not be familiar with the terminology used in another unit of the organization.

Traditional enterprise search systems trip and skin their knees over the time issue and over the “locate what’s needed issue.” These are problems that have persisted in search box oriented systems since the days of RECON, SDC Orbit, and Dialcom. There is little a manager can do to create more time. Time is a very valuable commodity and it often determines what type of decision is made and how risk laden that decision may be.

There is also little one can do to change how a bright human works with a system that forces a busy individual to perform iterative steps that often amount to guessing the word or phrase to unlock what’s hidden in an index or indexes.

Little wonder that convincing a customer to license a traditional keyword system continue to bedevil vendors.

A second problem is the nature of access. There is news floating around that Facebook has been able to generate more ad growth than Google because Facebook has more mobile users. Whether Facebook or Google dominates social mobile, the key development is “mobile.” Works need information access from devices which have smaller and different form factors from the multi core, 3.5 gigahertz, three screen workstation I am using to write this blog post.

Written by Stephen E. Arnold · Filed Under AI, algorithms, Feature, NGIA | Comments Off on Enterprise Search Lacks NGIA Functions

Dataiku: Former Exalead Wizard Strikes Big Data Fire

January 24, 2015

I read “Big Data : Le Français Dataiku Lève 3 millions d’Euros.” The recipient of the cash infusion is Dataiku. Founded by former Exalead wizard Florian Douetteau, Dataiku offers:

a software platform that aggregates all the steps and big data tools necessary to get from raw data to production ready applications. It shortens the load-prepare-test-deploy cycles required to create data driven applications.

The company’s approach is to reduce the complexity of Big Data app construction. The company’s algorithms support predictive analytics. A community edition download is available at http://www.dataiku.com/dss/editions/.

Dataiku plans to open an office in the US in 2015.

Information about Dataiku is at http://www.dataiku.com.

Stephen E Arnold, January 24, 2015

Written by Stephen E. Arnold · Filed Under algorithms, Big data, Financial, News | Comments Off on Dataiku: Former Exalead Wizard Strikes Big Data Fire

Artificial Intelligence Text for Free

September 24, 2014

Short honk: Artificial intelligence is in the news. If you want to brush up on your expertise, you can download Artificial Intelligence: Foundations of Computational Agents by David Poole and Alan Mackworth. Although published in 2010, the book is quite useful. Get your copy at this link http://bit.ly/1sVhWaq.

Stephen E Arnold, September 24, 2014

Written by Stephen E. Arnold · Filed Under algorithms, News | 1 Comment

Who Wrote What? Will an Algorithm Catch Name Surfers?

August 17, 2014

I read “New Algorithm Gives Credit Where Credit Is Due.” The write up sparked a number of thoughts. Let me highlight a couple of passages that made it into my research file.

The focus of the paper, in my opinion, are documents intended for peer reviewed publications and conferences. The write up did not include a sample of the type of “authorship” labeling that takes place. I dug through my files and located a representative example:

This is a paper about stuffing electronics on a contact lens. Microsoft was in this game. Google hired Babak Parviz (aka Babak Amir Parviz, Babak Amirparviz, and Babak Parvis). The paper has four authors:

H. Yao
A. Afanasiev
I. Lahdesmaki
B. A. Parviz

The idea is that the numerical recipe devised at the Center for Complex Network Research will figure out who did most of the work. I think this is a good idea because my research suggests that the guys doing the heavy lifting in the lab, with Excel, and writing were Yao, Afanasiev, and Lahdesmaki. The guru for the work was Parviz. I could be wrong, so an algorithm to help me out is of interest.

One of the points I highlighted in the write up was:

Using the algorithm, which Shen [math whiz] developed, the team revealed a new credit allocation system based on how often the paper is co-??cited with the other papers published by the paper’s co-??authors, capturing the authors’ additional contributions to the field.

Okay, my take on this is that this is a variation of Eugene Garfield’s citation analysis work. That is useful, but it does not dig very deeply into the context for the paper, the patent applications afoot, or the controls placed on the writers by their employers or their conscience. In short, I need some concrete examples or better yet access to the software so I can run some tests. Yep, just like those that mid tier consulting firms (what I call azure chip consultants) do not do. For reference see the Netscout legal document or my saucisson write up.)

The second point is that the sample strikes me as small. I know the rule of thumb that one well regarded researcher used was 50 in the sample, but there are hundreds of thousands of technical papers. Many are available as open source from services like PLOS One. Here’s the point I noted:

the team looked at 63 prize-??winning papers using the algorithm. In another finding, the algorithm showed physicist Tom Kibble, who in 1964 wrote a research paper on the Higgs boson theory, should receive the same amount of credit as Nobel prize winners Peter Higgs and François Englert.

I think the work is interesting, but it is in my opinion not ready for prime time.

I know that one content processing firm almost totally dependent on the US Army for funding has been working to identify misinformation, disinformation, and reformation. So far, the effort has yielded no commercial product. Other companies purport to have the ability to “understand” content. Presumably this includes the entities identified in the content object. Progress has stalled. Smart software is easier to write about in a marketing slide deck or a proposal than actually deliver.

That’s why authorship remains something a human has to chase down. Let me give you an example. I provided research to IDC, a mid tier consulting firm in 2012. From august 2012 to July 17, 2014, IDC marketed reports that carried my name, two of my research assistants’ names, and an IDC “expert’s” name. Dave Schubmehl, the IDC “expert” in search is listed as the “author.”

Now is he?

I am confident that in his mind and in IDC’s corporate wisdom he is the man. The person who justifies surfing on another’s name illustrates a core problem in authorship. You can see examples of Dave Schubmehl’s name surfing at this link. The sale of one of these documents on Amazon was an interesting attempt to gain traction for Dave Schubmehl in the high traffic eBook store. See “Amazon May Be Disintermediating Publishers: Maybe Good News for Authors.” I include a screen shot of the Amazon “hit.” My legal eagle successfully got the document removed from Amazon. I am not an Amazon author and don’t want to be.

Hopefully the algorithm to identify the “real” author of a series of $3,500 reports will become a commercial reality. I am interested to learn if there are any other mid tier consulting firms that have used others’ content without getting appropriate permissions. How many “experts” follow the IDC path of expediency?

For now, name surfers have to tracked one by one. Shubmehl and Arnold are now linked. Arnold is the surfboard; Schubmehl is the surfer. Catch a wave is the motto of many surfers.

Stephen E Arnold, August 17, 2014

Written by Stephen E. Arnold · Filed Under algorithms, Business strategy, News | 1 Comment

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Useful Probability Lesson in Monte Carlo Simulations

CyberOSINT Update: No Fooling

Digital Shadows Searches the Shadow Internet

Algorithms: Be Careful with Those College Math Notes

Algorithm Complexity Simplified

Math Equation Similarity Search

Enterprise Search Lacks NGIA Functions

Users Want More Than Hunting through a Rubbish

Dataiku: Former Exalead Wizard Strikes Big Data Fire

Artificial Intelligence Text for Free

Who Wrote What? Will an Algorithm Catch Name Surfers?

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Users Want More Than Hunting through a Rubbish

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta