Ambercite: A Patent Similarity Service

July 20, 2017

We learned about an Australian start up called Ambercite. The company’s service allows those wanting to know the answer to a question like this:

What patents are similar to US7593939?

Most of the online patent search systems do not deliver quick, comprehensive similarity results. When I have to think about patent similarity, I have found that several services have to be consulted and then some old-fashioned, billable time must be generously applied. Ambercite wants to change this approach to one powered by a more practical system. The company says:

Ambercite can help you quickly find patents and commercial opportunities, in many cases, missed by others, with its tools and services.

For more information about the firm, point your browser to this link. Worth watching.

Stephen E Arnold, July 20, 2017

TechnoSecurity & Digital Forensics Conference Info

July 20, 2017

I am giving two talks about the Dark Web at the September 2017 TechnoSecurity & Digital Forensics Conference. With the take down of AlphaBay and the attentions Dark Web sources of synthetic drugs are getting in the main stream media, the sessions will be of particular relevance to law enforcement, security, and intelligence professionals. My first talk is a quick start basics lecture. My second presentation focuses on free an and source tools and the commercial services which can flip on the lights in the Dark Web.

The conference has emerged as one of the most important resources for corporate network security professionals, federal, state and local law enforcement digital forensic specialists, and cybersecurity industry leaders from around the world. The purpose is to raise international awareness of developments, teaching, training, responsibilities, and ethics in the field of IT security and digital forensics. The event will feature more than 70 speakers, 60 sessions, 20 new product demonstrations, and 25 sponsors and exhibits. exhibits. For full details and to register, please visit www.TechnoSecurity.us.

As a reader of Beyond Search, you qualify for a 30 percent discount. Just use the promotional code DKWB17 when you sign up online.

Stephen E Arnold, July 20, 2017

Google: Recycling and Me Too-ing

July 20, 2017

Quite a week for the Google. The company’s Glass product is now positioned as a tool for the world of the enterprise, not the world of the low cost Snap glasses. Snap glasses are available on Amazon for $129.

image

Google informs me that “We’ve all been busy.” Nah, I have not been busy no matter what Google asserts.

Someday I will recount some of the information I collected when Google Glass was a fashion thing, a home wrecker, and a mechanism for destabilizing a Silicon Valley whiz kid. But not now, not in this post about recycling and me too-ing.

The recycle part is wrapped, is it not? Google Glass is back as an non-fashion statement. Recycling is good. Newspapers, plastic bottles, and heads up displays which work until the battery dies or the online connection is lost.

Now the me too-ing.

I read “Google Formally Announces Hire, Its LinkedIn Competitor.” That pretty much tells the story. LinkedIn, the job hunting and self promoting engine loved by many folks who want to be in the top one percent, is part of Microsoft. Google wants to be the 21st century Microsoft in order to do something other than sell online ads, finds the job hunting and self promotion sector promising. Well, maybe it will annoy Microsoft and take a bite out of that company’s efforts to be more than a vendor of apps and laptops covered in synthetic fabric.

The idea, as I understand the write up’s version, is:

Google has formally introduced Hire, a recruiting app for small- and medium-sized businesses, which also integrates seamlessly with G Suite…Google has announced Hire, an app that provides a recruiting platform aimed towards US businesses with under 1,000 employees. Hire makes it easier for companies to find suitable candidates for jobs, and manage the interview process efficiently. The app is further aided by seamless integration with Google’s G Suite, which over three million businesses use.

The service looks like “LinkedIn Light” from my vantage point in Harrod’s Creek. But what’s interesting to me is that Google has a dossier invention which creates profiles of people from disparate sources of information. If my memory is working this morning, the example I learned about takes items from multiple databases and assembles a profile. The case example was a snapshot of Michael Jackson. The report was a dossier which included aliases like “Jocko”, pop culture effluvia, and some substantive stuff like location. The presentation seemed quite similar to what is called a bubble gum card in certain circles.

If Google keeps wood behind this project, perhaps the dossier type function will become available. That would be more useful to me than a self promotion profile on LinkedIn. For now, Google seems content to do the me too thing in order to nibble away at Microsoft’s multi billion bet on a social media platform for “professionals,” whatever that term means. Is it possible Google wants to remind the Microsofties that the GOOG wishes to see the company fade into the sunset or buy ads on Google to promote its fabric covered laptop?

I am okay with “LinkedIn Light” because it has a bit of a kick unlike low cal me too alternatives. Google’s innovation balloons may not be able to take off.

Stephen E Arnold, July 20, 2017

Stephen E Arnold,

Software That Detects Sarcasm on Social Media

July 20, 2017

Technion-Israel Institute of Technology Faculty of Industrial Engineering and Management has developed Sarcasm SIGN, a software that can detect sarcasm in social media content. People with learning difficulties will find this tool useful.

According to an article published by Digital Journal titled Software Detects Sarcasm on Social Media:

The primary aim is to interpret sarcastic statements made on social media, be they Facebook comments, tweets or some other form of digital communication.

As we move towards a more digitized world where the majority of our communications are through digital channels, people with learning disabilities are at the receiving end. As machine learning advances so do the natural language capabilities. Tools like these will be immensely helpful for people who are unable to understand the undertones of communication.

The same tool can also be utilized by brands for determining who is talking about them in a negative way. Now ain’t that wonderful Facebook?

Vishal Ingole, July 20, 2017

A Potentially Useful List of Enterprise Search Engine Servers

July 20, 2017

We found a remarkable list at Predictive Analytics Today—“Top 23 Enterprise Search Engine Servers.” The write-up introduces its roster of resources:

Enterprise Search is the search information within an enterprise, searching of content from multiple enterprise-type sources, such as databases and intranets. These search systems index data and documents from a variety of sources including file systems, intranets, document management systems, e-mail, and databases. Enterprise search systems also integrate structured and unstructured data in their collections and also use access controls to enforce a security policy on their users.

Entries are logically presented under two categories, proprietary solutions and open source software. From Algolia to Xapian, the article summarizes pros and cons of each. See the post for details.

However, we have a few notes to add about some particular platforms. For example, the Google Search Appliance has been discontinued, though Constellio is still going… in Canada. SearchBlox is now Elasticsearch, and SRCH2 was originally designed for mobile searches. Also, isn’t Sphinx Search specifically for SQL data? Hmm. We suggest this list could make a good springboard, but server shoppers should take its specifics with a grain of salt, and be sure to do your own follow-up research.

Cynthia Murrell, July 20, 2017

IBM Watson: Two Views of the Same Pile of Tinker Toys

July 19, 2017

I find IBM an interesting outfit to watch. But more entertaining is watching how the Watson product and service is perceived by smart people. On the side of the doubters is a Wharton grad, James Kisner, who analyzes for a living at Jeffries & Co. His report “Creating Shareholder Value with AI? Not So Elementary, My Dear Watson?” suggests that IBM is struggling to makes its big bet pay off. If not a Google moon shot, Mr. Kisner thinks the low orbit satellite launch is in an orbit which will result in Watson burring up upon re-entry to reality.

Image result for chihuahua costume

The Big Dog of artificial intelligence and smart software may be a Chihuahua dressed up like a turkey, not a very big dog, not much of a bark, and certainly not equipped to take a big bite out a Wharton trained analyst’s foot.

On the rah rah side is Vijay, a blogger who does not put his name on his blog or on his About page. (One of my intrepid researchers thinks this Vijay’s last name is “Vijayasankar?.” Maybe?) I assume he is famous, just not here in Harrod’s Creek. His most recent write up about Watson is “IBM Watson Is Just Fine, Thank You!” His motivation for the write up is that the attention given to the Jeffries’ report caught his attention. He is a straight shooter; for example:

I am a big fan of criticism of technology – and as folks who have known me over time can vouch, I seldom hold back what is in my mind on any topic. I strongly believe that criticism is healthy for all of us – including businesses, and without it we cannot grow. If you go through my previous blogs, you can see first hand how I throw cold water on hype.

I like the cold water on hype from a person who is an IBM executive, and one who has been involved in the IBM Watson health initiatives. (I think this includes the star crossed Anderson project in Houston. I hear, “Houston, we have a problem,” but you may not.) I highlighted these points in this blog post:

  1. Hey, world, IBM is an enterprise product, not a consumer product. This seems obvious, but apparently IBM’s ability to communicate what it is selling and to whom is not working at peak efficiency or maybe not working because everyone is confused about Watson?
  2. IBM does not do the data federation things with its customer data. That’s good. I know that IBM sells a mainframe that encrypts everything. Interesting but I am not sure how this addresses flat revenue growth, massive layoffs, and the baffling Watson marketing which recently had a white cube floating in a tax preparer’s office. A white cube?
  3. IBM Watson has lots of successes. That’s a great assertion. The problem is that Watson started out as the next big thing. There was a promise of billions in revenue. There was a big office commitment in Manhattan. Then there was the implosion at the Houston health center. “Watson, do you read me?” I once tracked some of the Watson craziness in a series called the “Weakly Watson.” I gave up. The actual examples struck me as a painful type of fake news. What’s interesting is that the “weakly” stories were “real.” Scary to me and to stakeholders.
  4. Watson is not a product. Watson is an API to the IBM ecosystem. Vendor lock in beckons. And, of course, lots of APIs. These digital tinker toys can be snapped together. The problems range from the cost and time required for system training, the consulting and engineering services price tag, and the massaging required to explain that Watson is something that requires a lot of work. For the Instagram crowd that’s a problem. “Houston. Houston. Do you copy? Tinker toys. Lego blocks. Do you copy?”
  5. Watson “some times needs consulting.” Talk about an understatement. Watson needs lots and lots of consulting, engineering services, training, configuring, tuning. and training. Because Watson is a confection of open source, acquired technologies, and home brew code—a lot of work is needed. That’s because Watson was designed to generate high margin services, not the trivial revenue from online ads or from people ordering laundry detergent by pressing a button on their washing machine.
  6. Watson has two things in its bag of tricks: “Great marketing” and “AI talent.” Okay, marketing and smart people. The basic problem IBM has to solve before investors get frisky is generating significant, sustainable revenues and healthy margins. Spending money buys marketing and people. Effective management orchestrates what can be bought into stuff that can be sold at a profit.

The Vijay write up ends with a question. Here you go: “So why is IBM not publishing Watson revenue specifically?” This Vijay fellow who assumes that I know his last name does not answer the question. In the deafening silence, we need an answer.

That brings me to the Jeffries & Co. report by James Kisner, who is certified to do financial analysis. The answer to Vijay’s question consumes 53 pages of verbiage, charts, and tables of numbers. The entire document was available on July 18, 2017, at this link, but it may disappear. Many analyst documents disappear for the average guy. (If the link is dead, head over to Investext or give Jeffries & Co. a quick call to see if that will get you the meaty document.

Image result for snarling guard dog

A Jeffries & Co. analyst with teeth bites into the IBM financial data and seems to be unsatisfied.

In a nutshell, the Jeffries’ report says that IBM Watson is a limp noodle. Among the Watson characteristics are unhappy customers, wild and crazy marketing, misfires on deep learning, and the incredibly difficult, time consuming, and expensive data preparation required to make the system say, “Woof, woof” or maybe “Wolf, wolf” when there is something important for a human to notice.

Net net: IBM’s explanations of Watson have not produced the revenues and profits stakeholders expect. Jeffries & Co. goes MBA crazy providing a wide range of data to support the argument that Watson is struggling.

That “woof, woof” is the sound of a Chihuahua barking with the help of IBM spokespeople and lots of PR and marketing minions. The Wharton guy is a larger dog, barks ferociously, and has a bite backed up by data. IBM has to prove that it can solve problems for clients, generate sustainable revenue, and keep the competition from chowing down on a Watson weighted down with digital tinker toys.

Stephen E Arnold, July 19, 2017

ArnoldIT Publishes Technical Analysis of the Bitext Deep Linguistic Analysis Platform

July 19, 2017

ArnoldIT has published “Bitext: Breakthrough Technology for Multi-Language Content Analysis.” The analysis provides the first comprehensive review of the Madrid-based company’s Deep Linguistic Analysis Platform or DLAP. Unlike most next-generation multi-language text processing methods, Bitext has crafted a platform. The document can be downloaded from the Bitext Web site via this link.

Based on information gathered by the study team, the Bitext DLAP system outputs metadata with an accuracy in the 90 percent to 95 percent range.
Most content processing systems today typically deliver metadata and rich indexing with accuracy in the 70 to 85 percent range.

According to Stephen E Arnold, publisher of Beyond Search and Managing Director of Arnold Information Technology:

“Bitext’s output accuracy establish a new benchmark for companies offering multi-language content processing system.”

The system performs in near real time, more than 15 discrete analytic processes. The system can output enhanced metadata for more than 50 languages. The structured stream provides machine learning systems with a low cost, highly accurate way to learn. Bitext’s DLAP platform integrates more than 30 separate syntactic functions. These include segmentation, tokenization (word segmentation, frequency, and disambiguation, among others. The DLAP platform analyzes more  than 15 linguistic features of content in any of the more than 50 supported languages. The system extracts entities and generates high-value data about documents, emails, social media posts, Web pages, and structured and semi-structured data.

DLAP Applications range from fraud detection to identifying nuances in streams of data; for example, the sentiment or emotion expressed in a document. Bitext’s system can output metadata and other information about processed content as a feed stream to specialized systems such as Palantir Technologies’ Gotham or IBM’s Analyst’s Notebook. Machine learning systems such as those operated by such companies as Amazon, Apple, Google, and Microsoft can “snap in” the Bitext DLAP platform.

Copies of the report are available directly from Bitext at https://info.bitext.com/multi-language-content-analysis Information about Bitext is available at www.bitext.com.

Kenny Toth, July 19, 2017

Big Data in Biomedical

July 19, 2017

The biomedical field which is replete with unstructured data is all set to take a giant leap towards standardization with Biological Text Mining Unit.

According to PHYS.ORG, in a peer review article titled Researchers Review the State-Of-The-Art Text Mining Technologies for Chemistry, the author states:

Being able to transform unstructured biomedical research data into structured databases that can be more efficiently processed by machines or queried by humans is critical for a range of heterogeneous applications.

Scientific data has fixed set of vocabulary which makes standardization and indexation easy. However, most big names in Big Data and enterprise search are concentrating their efforts on e-commerce.

Hundreds of new compounds are discovered every year. If the data pertaining to these compounds is made available to other researchers, advancements in this field will be very rapid. The major hurdle is the data is in an unstructured format, which Biological Text Mining Unit standards intend to overcome.

Vishal Ingole, July 19, 2017

Study: Social Media and Young People

July 19, 2017

Some of us elders have been saying it for years, but now research seems to confirm it—social media can be bad for mental health.  The Next Web reports, “Study: Snapchat and Instagram Are the Worst for Young People.” The study is from the UK’s Royal Society for Public Health (RSPH), and the “young people” sampled are 1,479 Brits aged 14-24. An explanatory three-minute video from the RSPH accompanies the article. Writer Rachel Kaser reports:

The researchers surveyed 1,479 British youths ages 14-24, asking them how they felt the different social media networks effected their mental health. They took in several factors such as body image, sleep deprivation, bullying, and self-identity. The results suggest the two worst social media networks for kids are Instagram and Snapchat, as they had terrible scores for body image, bullying, and anxiety. Twitter and Facebook weren’t much better, though. YouTube was the only one that apparently inspired more positive feelings than negative ones. It could be because Snapchat and Instagram are image-based apps, meaning it’s not easy for users to avoid visual comparisons. Both apps ranked high on ‘Fear of Missing Out,’ and the researchers suggested this was likely to foster anxiety in fellow users.

I recommend the video for interested readers. It shows some respondents’ answers to certain questions, and clearly summarizes the pros and cons of each platform examined. It helpfully concludes with a list of concrete suggestions: Implement pop-up notifications that tell users when they’ve been online for a certain amount of time; require watermarks on photos that have been digitally altered; educate folks on the healthy use of social media; and incorporate analysis tools to identify users at risk for poor mental health and “discreetly” steer them toward help. It does seem such measures could help; will social-media companies cooperate?

Cynthia Murrell, July 19, 2017

HonkinNews for July 18, 2017 Now Available

July 18, 2017

IBM’s never-ending marketing of all things artificial captures more air time in this week’s HonkinNews. Open Text, a Swiss Army knife-like enterprise software company shifts to a new direction. Instead of marketing its proprietary search and retrieval software, Open Text has channeled IBM Watson. Open Text’s Magellan is an open source solution which looks like a “me too” product to the HonkinNews goose. This week’s program explains that Stephen E Arnold will deliver two one hour Dark Web lectures at the September 2017 TechnoSecurity & Digital Forensics Conference. HonkinNews also explains how to get information about Stephen’s new book “Dark Web Notebook.” The secret is to search Google for the words “Arnold Dark Web Notebook.” Google is back on the HonkinNews radar. No, it’s not the company’s purchase of another smart software firm. This one is in central India. Google, according to the Wall Street Journal, engages in content marketing. This means Google pays “experts” to write positive articles and reports about Google. This week’s HonkinNews includes a test to help you determine if you have what it takes to be a “real” journalist. Don’t worry. Stephen includes a hint so you can score a 100 on this tough exam. We thought Yahoo was but a memory. Wrong. Verizon seems to have diffused some Yahoot DNA through its corporate body. With the loss of some customer data, there are several million people who might wonder if Yahoot is a forever thing. You can view the video at this link.

Kenny Toth, July 18, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta