Artificial Intelligence Competition Reveals Need for More Learning

March 3, 2016

The capabilities of robots are growing but, on the whole, have not surpassed a middle school education quite yet. The article Why AI can still hardly pass an eighth grade science test from Motherboard shares insights into the current state of artificial intelligence as revealed in a recent artificial intelligence competition. Chaim Linhart, a researcher from an Israel startup, TaKaDu, received the first place prize of $50,000. However, the winner only scored a 59.3 percent on this series of tasks tougher than the conventionally used Turing Test. The article describes how the winners utilized machine learning models,

“Tafjord explained that all three top teams relied on search-style machine learning models: they essentially found ways to search massive test corpora for the answers. Popular text sources included dumps of Wikipedia, open-source textbooks, and online flashcards intended for studying purposes. These models have anywhere between 50 to 1,000 different “features” to help solve the problem—a simple feature could look at something like how often a question and answer appear together in the text corpus, or how close words from the question and answer appear.”

The second and third place winners scored just around one percent behind Linhart’s robot. This may suggest a competitive market when the time comes. Or, perhaps, as the article suggests, nothing very groundbreaking has been developed quite yet. Will search-based machine learning models continue to be expanded and built upon or will another paradigm be necessary for AI to get grade A?

Megan Feil, March 3, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Delve Is No Jarvis

March 3, 2016

A podcast at SearchContentManagement, “Is Microsoft Delve Iron Man’s Edwin Jarvis? No Way,” examines the ways Delve has yet to live up to its hype. Microsoft extolled the product when it was released as part of the Office 365 suite last year. As any developer can tell you, though, it is far easier to market than deliver polished software. Editor Lauren Horwitz explains:

“While it was designed to be a business intelligence (BI), enterprise search and collaboration tool wrapped into one, it has yet to make good on that vision. Delve was intended to be able to search users’ documents, email messages, meetings and more, then serve up relevant content and messages to them based on their content and activities. At one level, Delve has failed because it hasn’t been as comprehensive a search tool as it was billed. At another level, users have significant concerns about their privacy, given the scope of documents and activities Delve is designed to scour. As BI and SharePoint expert Scott Robinson notes in this podcast, Delve was intended to be much like Edwin Jarvis, butler and human search tool for Iron Man’s Tony Stark. But Delve ain’t no Jarvis, Robinson said.”

So, Delve was intended to learn enough about a user to offer them just what they need when they need it, but the tool did not tap deeply enough into the user’s files to effectively anticipate their needs. On top of that, it’s process is so opaque that most users don’t appreciate what it is doing, Robinson indicated. For more on Delve’s underwhelming debut, check out the ten-minute podcast.

 

Cynthia Murrell, March 3, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Drone2Map: Smart Software

March 2, 2016

If you are interested in mapping and geospatial analyses, you will want to read “ESRI Introduces Drone2Map to Process Aerial Images.” The write up reports:

Drone2Map incorporates Pix4D’s powerful image-processing engine to analyze images taken from drones and convert them into a variety of 2-D and 3-D maps.

What’s interesting to me is that the software is available for public download. You will need to know about ArcGIS and some other tools.

You can find the software at this link. You will have to jump through a couple of hoops. Don’t forget to register your drone.

Stephen E Arnold, March 2, 2016

Stolen Online Account Info Now More Valuable than Stolen Credit Card Details

March 2, 2016

You should be aware that criminals are now less interested in your credit cards and other “personally identifiable information” and more keen on exploiting your online accounts. As security firm Tripwire informs us in their State of Security blog, “Stolen Uber, PayPal Accounts More Coveted than Credit Cards on the Dark Web.” Writer Maritza Santillan explains:

“The price of these stolen identifiers on the underground marketplace, or ‘the Dark Web,’ shows the value of credit cards has declined in the last year, according to security firm Trend Micro. Last week, stolen Uber account information could be found on underground marketplaces for an average of $3.78 per account, while personally identifiable information, such as Social Security Numbers or dates of birth, ranged from $1 to $3.30 on average – down from $4 per record in 2014, reported CNBC. Furthermore, PayPal accounts – with a guaranteed balance of $500 –were found to have an average selling price of $6.43. Facebook logins sold for an average of $3.02, while Netflix credentials sold for about 76 cents. By contrast, U.S.-issued credit card information, which is sold in bundles, was listed for no more than 22 cents each, said CNBC.”

The article goes on to describe a few ways criminals can leverage these accounts, like booking Uber “ghost rides,” or assembling personal details for a very thorough identity theft. Pros say the trend means service providers to pay closer attention to usage patterns, and to beef up their authentication processes. Specifically, says Forrester’s Andras Cser, it is time to move beyond passwords; instead, he proposes, companies should look for changes in biometric data, like phone position and finger pressure, which would be communicated back to them by our mobile devices. So we’re about to be even more closely monitored by the companies we give our money to. All for our own good, of course.

 

Cynthia Murrell, March 2, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

No Search Just Browse Images on FindA.Photo

March 2, 2016

The search engine FindA.Photo proves itself to be a useful resource for browsing images based on any number of markers. The site offers a general search by terms, or the option of browsing images by color, collection (for example, “wild animals,” or “reflections”) or source.  The developer of the site, David Barker, described his goals for the services on Product Hunt,

“I wanted to make a search for all of the CC0 image sites that are available. I know there are already a few search sites out there, but I specifically wanted to create one that was: simple and fast (and I’m working on making it faster), powerful (you can add options to your search for things like predominant colors and image size with just text), and something that could have contributions from anyone (via GitHub pull requests).”

My first click on a swatch of royal blue delivered 651 images of oceans, skies, panoramas of oceans and skies, jellyfish ballooning underwater, seagulls soaring etc. That may be my own fault for choosing such a clichéd color, but you get the idea. I had better (more various) results through the collections search, which includes “action,” “long-exposure,” “technology,” “light rays,” and “landmarks,” the last of which I immediately clicked for a collage of photos of the Eiffel Tower, Louvre, Big Ben, and the Great Wall of China.

 

Chelsea Kerwin, March 2, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Real Time: Maybe, Maybe Not

March 1, 2016

Years ago an outfit in Europe wanted me to look at claims made by search and content processing vendors about real time functions.

The goslings and I rounded up the systems, pumped our test corpus through, and tried to figure out what was real time.

The general buzzy Teddy Bear notion of real time is that when new data are available to the system, the system processes the data and makes them available to other software processes and users.

The Teddy Bear view is:

  1. Zero latency
  2. Works reliably
  3. No big deal for modern infrastructure
  4. No engineering required
  5. Any user connected to the system has immediate access to reports including the new or changed data.

Well, guess what, Pilgrim?

We learned quickly that real time, like love and truth, is a darned slippery concept. Here’s one view of what we learned:

image

Types of Real Time Operations. © Stephen E Arnold, 2009

The main point of the chart is that there are six types of real time search and content processing. When someone says, “Real time,” there are a number of questions to ask. The major finding of the study was that for near real time processing for a financial trading outfit, the cost soars into seven figures and may keep on rising as the volume of data to be processed goes up. The other big finding was that every real time system introduces latency. Seconds, minutes, hours, days, and weeks may pass before the update actually becomes available to other subsystems or to users. If you think you are looking at real time info, you may want to shoot us an email. We can help you figure out which type of “real time” your real time system is delivering. Write benkent2020 @ yahoo dot com and put Real Time in the subject line, gentle reader.

I thought about this research project when I read “Why the Search Console Reporting Is not real time: Explains Google!” As you work through the write up, you will see that the latency in the system is essentially part of the woodwork. The data one accesses is stale. Figuring out how stale is a fairly big job. The Alphabet Google thing is dealing with budgets, infrastructure costs, and a new chief financial officer.

Real time. Not now and not unless something magic happens to eliminate latencies, marketing baloney, and user misunderstanding of real time.

Excitement in non real time.

Stephen E Arnold, March 1, 2016

Natural Language Processing App Gains Increased Vector Precision

March 1, 2016

For us, concepts have meaning in relationship to other concepts, but it’s easy for computers to define concepts in terms of usage statistics. The post Sense2vec with spaCy and Gensim from SpaCy’s blog offers a well-written outline explaining how natural language processing works highlighting their new Sense2vec app. This application is an upgraded version of word2vec which works with more context-sensitive word vectors. The article describes how this Sense2vec works more precisely,

“The idea behind sense2vec is super simple. If the problem is that duck as in waterfowl andduck as in crouch are different concepts, the straight-forward solution is to just have two entries, duckN and duckV. We’ve wanted to try this for some time. So when Trask et al (2015) published a nice set of experiments showing that the idea worked well, we were easy to convince.

We follow Trask et al in adding part-of-speech tags and named entity labels to the tokens. Additionally, we merge named entities and base noun phrases into single tokens, so that they receive a single vector.”

Curious about the meta definition of natural language processing from SpaCy, we queried natural language processing using Sense2vec. Its neural network is based on every word on Reddit posted in 2015. While it is a feat for NLP to learn from a dataset on one platform, such as Reddit, what about processing that scours multiple data sources?

 

Megan Feil, March 1, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

IBM Continued to Brag About Watson, with Decreasing Transparency

February 29, 2016

A totally objective article sponsored by IBM on Your Story is titled How Cognitive Systems Like IBM Watson Are Changing the Way We Solve Problems. The article basically functions to promote all of the cognitive computing capabilities that most of us are already keenly aware that Watson possesses, and to raise awareness for the Hackathon event taking place in Bengaluru, India. The “article” endorses the event,

“Participants will have an unprecedented opportunity to collaborate, co-create and exchange ideas with one another and the world’s most forward-thinking cognitive experts. This half-day event will focus on sharing real-world applications of cognitive technologies, and allow attendees access to the next wave of innovations and applications through an interactive experience. The program will also include panel discussions and fireside chats between senior IBM executives and businesses that are already working with Watson.”

Since 2015, the “Watson for Oncology” program has involved Manipal Hospitals in Bengaluru, India. The program is the result of a partnership between IBM and Memorial Sloan Kettering Cancer Center in New York. Watson has now consumed almost 15 million pages of medical content from textbooks and journals in the hopes of providing rapid-fire support to hospital staffers when it comes to patient records and diagnosis. Perhaps if IBM put all of their efforts into Watson’s projects instead of creating inane web content to promote him as some sort of missionary, he could have already cured cancer. Or not.

 

Chelsea Kerwin, February 29, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Intel Identifies the Future of High Performance Computing. Surprise. It Is Itself

February 29, 2016

I make a feeble attempt to pay attention to innovations in high performance computing. The reason is that some mathematical procedures require lots of computing resources; for example, figuring out the interactions in a fusion plasma test. Think in terms of weeks of calculation. Bummer. Most folks believe that the cloud and other semi magical marketing buzzwords have made super computers as fast as those in a sci fi movie. Wrong, gentle reader. There are computational issues. Big O?

image

I read with interest “The Future of High Performance Computing Has Arrived.” The write up does not do too much with the GPU methods, the brute force methods, or the “quantum” hopes and dreams.

Nope.

The write up points out with a nifty diagram with many Intel labels:

Intel is tightly integrating the technologies at both the component and system levels, to create a highly efficient and capable infrastructure. One of the outcomes of this level of integration is how it scales across both the node and the system. The result is that it essentially raises the center of gravity of the memory pyramid and makes it fatter, which will enable faster and more efficient data movement.

I like the mathy center of gravity lingo. It reminds me of the “no gravity” buzzword from 15 years ago.

Allegedly Moore’s Law is dead. Maybe? Maybe not? But as long as we are geared up with Von Neumann’s saddles and bits, Intel is going to ride that pony.

Gentle reader, we need much more computing horse power. Is it time to look for a different horse to ride? Intel does not agree.

Stephen E Arnold, February 27, 2016

New Tor Communication Software for Journalists and Sources Launches

February 29, 2016

A new one-to-one messaging tool for journalists has launched after two years in development. The article Ricochet uses power of the dark web to help journalists, sources dodge metadata laws from The Age describes this new darknet-based software. The unique feature of this software, Ricochet, in comparison to others used by journalists such as Wickr, is that it does not use a server but rather Tor. Advocates acknowledge the risk of this Dark Web software being used for criminal activity but assert the aim is to provide sources and whistleblowers an anonymous channel to securely release information to journalists without exposure. The article explains,

“Dr Dreyfus said that the benefits of making the software available would outweigh any risks that it could be used for malicious purposes such as cloaking criminal and terrorist operations. “You have to accept that there are tools, which on balance are a much greater good to society even though there’s a tiny possibility they could be used for something less good,” she said. Mr Gray argued that Ricochet was designed for one-to-one communications that would be less appealing to criminal and terrorist organisers that need many-to-many communications to carry out attacks and operations. Regardless, he said, the criminals and terrorists had so many encryption and anonymising technologies available to them that pointing fingers at any one of them was futile.”

Online anonymity is showing increasing demand as evidenced through the recent launch of several new Tor-based softwares like Ricochet, in addition to Wickr and consumer-oriented apps like Snapchat. The Dark Web’s user base appears to be growing and diversifying. Will public perception follow suit?

 

Megan Feil, February 29, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta