Real Time: Maybe, Maybe Not

March 1, 2016

Years ago an outfit in Europe wanted me to look at claims made by search and content processing vendors about real time functions.

The goslings and I rounded up the systems, pumped our test corpus through, and tried to figure out what was real time.

The general buzzy Teddy Bear notion of real time is that when new data are available to the system, the system processes the data and makes them available to other software processes and users.

The Teddy Bear view is:

  1. Zero latency
  2. Works reliably
  3. No big deal for modern infrastructure
  4. No engineering required
  5. Any user connected to the system has immediate access to reports including the new or changed data.

Well, guess what, Pilgrim?

We learned quickly that real time, like love and truth, is a darned slippery concept. Here’s one view of what we learned:

image

Types of Real Time Operations. © Stephen E Arnold, 2009

The main point of the chart is that there are six types of real time search and content processing. When someone says, “Real time,” there are a number of questions to ask. The major finding of the study was that for near real time processing for a financial trading outfit, the cost soars into seven figures and may keep on rising as the volume of data to be processed goes up. The other big finding was that every real time system introduces latency. Seconds, minutes, hours, days, and weeks may pass before the update actually becomes available to other subsystems or to users. If you think you are looking at real time info, you may want to shoot us an email. We can help you figure out which type of “real time” your real time system is delivering. Write benkent2020 @ yahoo dot com and put Real Time in the subject line, gentle reader.

I thought about this research project when I read “Why the Search Console Reporting Is not real time: Explains Google!” As you work through the write up, you will see that the latency in the system is essentially part of the woodwork. The data one accesses is stale. Figuring out how stale is a fairly big job. The Alphabet Google thing is dealing with budgets, infrastructure costs, and a new chief financial officer.

Real time. Not now and not unless something magic happens to eliminate latencies, marketing baloney, and user misunderstanding of real time.

Excitement in non real time.

Stephen E Arnold, March 1, 2016

Natural Language Processing App Gains Increased Vector Precision

March 1, 2016

For us, concepts have meaning in relationship to other concepts, but it’s easy for computers to define concepts in terms of usage statistics. The post Sense2vec with spaCy and Gensim from SpaCy’s blog offers a well-written outline explaining how natural language processing works highlighting their new Sense2vec app. This application is an upgraded version of word2vec which works with more context-sensitive word vectors. The article describes how this Sense2vec works more precisely,

“The idea behind sense2vec is super simple. If the problem is that duck as in waterfowl andduck as in crouch are different concepts, the straight-forward solution is to just have two entries, duckN and duckV. We’ve wanted to try this for some time. So when Trask et al (2015) published a nice set of experiments showing that the idea worked well, we were easy to convince.

We follow Trask et al in adding part-of-speech tags and named entity labels to the tokens. Additionally, we merge named entities and base noun phrases into single tokens, so that they receive a single vector.”

Curious about the meta definition of natural language processing from SpaCy, we queried natural language processing using Sense2vec. Its neural network is based on every word on Reddit posted in 2015. While it is a feat for NLP to learn from a dataset on one platform, such as Reddit, what about processing that scours multiple data sources?

 

Megan Feil, March 1, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

IBM Continued to Brag About Watson, with Decreasing Transparency

February 29, 2016

A totally objective article sponsored by IBM on Your Story is titled How Cognitive Systems Like IBM Watson Are Changing the Way We Solve Problems. The article basically functions to promote all of the cognitive computing capabilities that most of us are already keenly aware that Watson possesses, and to raise awareness for the Hackathon event taking place in Bengaluru, India. The “article” endorses the event,

“Participants will have an unprecedented opportunity to collaborate, co-create and exchange ideas with one another and the world’s most forward-thinking cognitive experts. This half-day event will focus on sharing real-world applications of cognitive technologies, and allow attendees access to the next wave of innovations and applications through an interactive experience. The program will also include panel discussions and fireside chats between senior IBM executives and businesses that are already working with Watson.”

Since 2015, the “Watson for Oncology” program has involved Manipal Hospitals in Bengaluru, India. The program is the result of a partnership between IBM and Memorial Sloan Kettering Cancer Center in New York. Watson has now consumed almost 15 million pages of medical content from textbooks and journals in the hopes of providing rapid-fire support to hospital staffers when it comes to patient records and diagnosis. Perhaps if IBM put all of their efforts into Watson’s projects instead of creating inane web content to promote him as some sort of missionary, he could have already cured cancer. Or not.

 

Chelsea Kerwin, February 29, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Intel Identifies the Future of High Performance Computing. Surprise. It Is Itself

February 29, 2016

I make a feeble attempt to pay attention to innovations in high performance computing. The reason is that some mathematical procedures require lots of computing resources; for example, figuring out the interactions in a fusion plasma test. Think in terms of weeks of calculation. Bummer. Most folks believe that the cloud and other semi magical marketing buzzwords have made super computers as fast as those in a sci fi movie. Wrong, gentle reader. There are computational issues. Big O?

image

I read with interest “The Future of High Performance Computing Has Arrived.” The write up does not do too much with the GPU methods, the brute force methods, or the “quantum” hopes and dreams.

Nope.

The write up points out with a nifty diagram with many Intel labels:

Intel is tightly integrating the technologies at both the component and system levels, to create a highly efficient and capable infrastructure. One of the outcomes of this level of integration is how it scales across both the node and the system. The result is that it essentially raises the center of gravity of the memory pyramid and makes it fatter, which will enable faster and more efficient data movement.

I like the mathy center of gravity lingo. It reminds me of the “no gravity” buzzword from 15 years ago.

Allegedly Moore’s Law is dead. Maybe? Maybe not? But as long as we are geared up with Von Neumann’s saddles and bits, Intel is going to ride that pony.

Gentle reader, we need much more computing horse power. Is it time to look for a different horse to ride? Intel does not agree.

Stephen E Arnold, February 27, 2016

New Tor Communication Software for Journalists and Sources Launches

February 29, 2016

A new one-to-one messaging tool for journalists has launched after two years in development. The article Ricochet uses power of the dark web to help journalists, sources dodge metadata laws from The Age describes this new darknet-based software. The unique feature of this software, Ricochet, in comparison to others used by journalists such as Wickr, is that it does not use a server but rather Tor. Advocates acknowledge the risk of this Dark Web software being used for criminal activity but assert the aim is to provide sources and whistleblowers an anonymous channel to securely release information to journalists without exposure. The article explains,

“Dr Dreyfus said that the benefits of making the software available would outweigh any risks that it could be used for malicious purposes such as cloaking criminal and terrorist operations. “You have to accept that there are tools, which on balance are a much greater good to society even though there’s a tiny possibility they could be used for something less good,” she said. Mr Gray argued that Ricochet was designed for one-to-one communications that would be less appealing to criminal and terrorist organisers that need many-to-many communications to carry out attacks and operations. Regardless, he said, the criminals and terrorists had so many encryption and anonymising technologies available to them that pointing fingers at any one of them was futile.”

Online anonymity is showing increasing demand as evidenced through the recent launch of several new Tor-based softwares like Ricochet, in addition to Wickr and consumer-oriented apps like Snapchat. The Dark Web’s user base appears to be growing and diversifying. Will public perception follow suit?

 

Megan Feil, February 29, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Startup Semantic Machines Scores Funding

February 26, 2016

A semantic startup looks poised for success with experienced  executives and a hefty investment, we learn from “Artificial Intelligence Startup Semantic Machines Raises $12.3 Million” at VentureBeat. Backed by investors from Bain Capital Ventures and General Catalyst Partners, the enterprise focuses on deep learning and improved speech recognition. The write-up reveals:

“Last year, Semantic Machines named Larry Gillick as its chief technology officer. Gillick was previously chief speech scientist for Siri at Apple. Now Semantic Machines is looking to go further than Siri and other personal digital assistants currently on the market. ‘Semantic Machines is developing technology that goes beyond understanding commands, to understanding conversations,’ the startup says on its website. ‘Our Conversational AI represents a powerful new paradigm, enabling computers to communicate, collaborate, understand our goals, and accomplish tasks.’ The startup is building tools that third-party developers will be able to use.”

Launched in 2014, Semantic Machines is based in Newton, Massachusetts, with offices in Berkeley and Boston. The startup is also seeking to hire a few researchers and engineers, in case anyone is interested.

 

Cynthia Murrell, February 26, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

More Hacked US Voter Data Appears on the Dark Web

February 25, 2016

From HackRead comes a piece called More US Voters Data Circulating on the Dark Net, which points to the lack of protection surrounding data on US voters. This data was leaked on the site The Hell on Dark Web. No reports yet suggest how this data was hacked. While no social security numbers or highly sensitive information was released, records include name, date of birth, voter registration dates, voting records, political affiliation and address. Continuing the explanation of implications, the article’s author writes,

“However, it provides any professional hacker substantial information to initiate and plan a phishing attack in the next election which takes place in the US. Recent discoveries, news and speculations have exposed the role of nation-state actors and cyber criminals in planning, instigating and initiating hacking attacks aimed at maligning the upcoming US elections. While social media has emerged as one of the leading platforms adopted by politicians when they wish to spread a certain message or image, cyber criminals and non-state actors are also utilizing the online platform to plan and initiate their hacking attacks on the US election.”

As the article reminds us, this is the not first instance of voter records leaking. Such leaks call into question how this keeps happening and makes us wonder about any preventative measures. The last thing needed surrounding public perception of voting is that it puts one at risk for cyber attacks. Aren’t there already enough barriers in place to keep individuals from voting?

 

Megan Feil, February 25, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

monograph

Brown Dog Fetches Buried Data

February 25, 2016

Outdated file formats, particularly those with no metadata, are especially difficult to search and utilize. The National Science Foundation (NSF) reports on a new search engine designed to plumb the unstructured Web in, “Brown Dog: A Search Engine for the Other 99 Percent (ofData).” With the help of a $10 million award from the NSF, a team at the University of Illinois-based National Center for Supercomputing Application (NCSA) has developed two complementary services. Writer Aaron Dubrow explains:

“The first service, the Data Access Proxy (DAP), transforms unreadable files into readable ones by linking together a series of computing and translational operations behind the scenes. Similar to an Internet gateway, the configuration of the Data Access Proxy would be entered into a user’s machine settings and then forgotten. From then on, data requests over HTTP would first be examined by the proxy to determine if the native file format is readable on the client device. If not, the DAP would be called in the background to convert the file into the best possible format….

“The second tool, the Data Tilling Service (DTS), lets individuals search collections of data, possibly using an existing file to discover other similar files in the data. Once the machine and browser settings are configured, a search field will be appended to the browser where example files can be dropped in by the user. Doing so triggers the DTS to search the contents of all the files on a given site that are similar to the one provided by the use….  If the DTS encounters a file format it is unable to parse, it will use the Data Access Proxy to make the file accessible.”

See the article for more on these services, which NCSA’s Kenton McHenry likens to a DNS for data. Brown Dog conforms to NSF’s Data Infrastructure Building Blocks program, which supports development work that advances the field of data science.

 

Cynthia Murrell, February 25, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

No Evidence That Terrorists Are Using Bitcoin

February 23, 2016

If you were concerned virtual currencies like Bitcoin are making things easier for Islamic State (aka IS, ISIS, ISIL, or Daesh), you can rest easy, at least for now. The International Business Times reports, “Isis: Bitcoin Not Used by Daesh.” That is the conclusion reached by a Europol investigation performed after last November’s attacks in Paris. Though some had suggested the terrorists were being funded with cyber money, investigators found no evidence of it.

On the other hand, the organization’s communication networks are thriving online through the Dark Web and a variety of apps. Writer Alistair Charlton tells us:

Better known by European law enforcement is how terrorists like IS use social media to communicate. The report says: “The internet and social media are used for communication and the acquisition of goods (weapons, fake IDs) and services, made relatively safe for terrorists with the availability of secure and inherently encrypted appliances, such as WhatsApp, Skype and Viber. In Facebook, VKA and Twitter they join closed and hidden groups that can be accessed by invitation only, and use coded language.”

se of Tor, the anonymising browser used to access the dark web where sites are hidden from search engines like Google, is also acknowledged by Europol. “The use of encryption and anonymising tools prevent conventional observation by security authorities. There is evidence of a level of technical knowledge available to religiously inspired terrorist groups, allowing them to make their use of the internet and social media invisible to intelligence and law enforcement agencies.”

Of course, like any valuable technology, anonymizing apps can be used for weal or woe; they benefit marginalized peoples trying to make their voices heard as much as they do terrorists. Besides, there is no going back to a disconnected world now. My question is whether terrorists have taken the suggestion, and are now working on a Bitcoin initiative. I suppose we will see, eventually.

 

Cynthia Murrell, February 23, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Study Determines Sad News for People Who Look on Facebook “Likes” as Friendship

February 23, 2016

The article on Independent titled Facebook Friends Are Almost entirely Fake, Study Finds illuminates the cold, cold world of Facebook. According to the study, out of the hundreds of “friends” accumulated on Facebook, typically only about four are true blue buds. Most of them are not interested in your life or sympathetic to your problems. 2% are actively trying to stab you in the back. I may have made up the last figure, but you get the picture. The article tells us,

“The average person studied had around 150 Facebook friends. But only about 14 of them would express sympathy in the event of anything going wrong. The average person said that only about 27 per cent of their Facebook friends were genuine. Those numbers are mostly similar to how friendships work in real life, the research said. But the huge number of supposed friends on a friend list means that people can be tricked into thinking that they might have more close friends.”

This is particularly bad news considering how Facebook has opened the gates to all populations meaning that most people have family members on the site in addition to friends. Aunt Mary may have knit you a sweater for Christmas, but she really isn’t interested in your status update about running into your ex and his new girlfriend. If this article teaches us anything, it’s that you should look offline for your real relationships.

 

Chelsea Kerwin, February 23, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta