Mathematical Recipes Revealed: Oh, Oh, Trouble

August 26, 2018

I don’t read the Times Literary Supplement. When I worked in London, I was able to flip through the printed version. In Harrod’s Creek, nope. I did spot a link to an essay with the snappy title “God Is in the Machine.” I took a look.

The write up belongs to the genre of non fiction essays which I call “Yep, that’s all there is.”

The focus is how algorithms work and why some are simple and others are complicated.

Think of the essay as explaining how math works to people who know right off the starting block who Eratosthenes was.

The main point of the first chunk of the write up is that algorithms are recipes, procedures which are implemented one at a time. The input yields an output.

The guts of the argument surface in this passage, attributed to a real algorithm wizard:

The researcher knew, of course, what data he’d fed into the process. He knew why he’d designed it, the problem it was trying to solve and the outputs that it produced. However, after he’d been trying to explain it for over an hour, he sat back in his chair, exhausted. “Yes, as you can see, the gap between input and output is difficult to understand,” he said. He’d flooded the algorithm with a huge amount of information, “a trend”, he said, because in the tech giant he could, and everyone did. But the amount of data meant it was hard to tell what the salient inputs within it were. “From a human perspective you’re not sure which of the inputs is significant; it’s hard to know what is actually driving the outputs. It’s hard to trace back, as a human, to know why a decision was made.”

The complexity emerges when:

Algorithms are stuck together
Data (which may or may not be consistent, accurate, or timely) are stuffed into the numerical recipe as “inputs”
Outputs which may or may not be what the user understands, wants, or can use.

The complexity is manageable if the creator or numerical poets are, what the essay calls, “rigorous.” Is rigor possible in Silicon Valley with professionals who focus on mobile phones, laptops, and lunch options?

Where’s this going?

Not surprisingly, I will have to read a forthcoming book called The Death of the Gods. Like other clarion calls to the use of numerical recipes to do what humans once thought they could do with sufficient education, experience, and judgment, numerical recipes can do—algorithms are the future.

Questions I want toss out when I meet with my research team next week: What if the algorithms are already in charge? Are search results objective? Can you explain why some data are not available from commercial sources? What control do you have over content when ads and “information” are freely mixed?

Perhaps the numerical recipe mechanisms are locked and loaded and firing millions of times a day? What if few hear, know, or understand that the big guns are blazing without sound or a flash? What if people do not care?

Stephen E Arnold, August 26, 2018

Written by Stephen E. Arnold · Filed Under algorithms, News | Comments Off on Mathematical Recipes Revealed: Oh, Oh, Trouble

Can IBM Watermark Neural Networks?

August 8, 2018

Leave it to IBM to figure out how to put their stamp on their AI models. Of course, as with other intellectual property, AI code can be stolen, so this is a welcome development for the field. In the article, “IBM Patenting Watermark Technology to Protect Ownership of AI Models at Neowin, we learn the technology is still in development, and the company hasn’t even implemented it in-house yet. However, if all goes well, the technology may find its way into customer products someday. Writer Usama Jawad reports:

“IBM says that it showcased its research regarding watermarking models developed by deep neural networks (DNNs) at the AsiaCCS ’18 conference, where it was proven to be highly robust. As a result, it is now patenting the concept, which details a remote verification mechanism to determine the ownership of DNN models using simple API calls. The company explains that it has developed three watermark generation algorithms…

These use different methods; specifically:

Embedding meaningful content together with the original training data as watermarks into the protected DNNs,
Embedding irrelevant data samples as watermarks into the protected DNNs
Embedding noise as watermarks into the protected DNNs.

We learned:

“IBM says that in its internal testing using several datasets such as MNIST, a watermarked DNN model triggers an ‘unexpected but controlled response’.”

Jawad notes one drawback as of yet—though the software works well online, it still fails to detect ownership when a model is deployed internally. From another article, “IBM Came Up With a Watermark for Neural Networks” at TheNextWeb, we spotted an interesting tidbit—Writer Tristan Greene points out a distinct lack of code bloat from the watermark. This is an important factor in neural networks, which can be real resource hogs.

For more information, you may want to see IBM’s blog post on the subject or check out the associated research paper. Beyond Search wonders what smart software developers will use these techniques. Amazon, Facebook, Google, Oracle, Palantir Technologies? Universities with IBM research support may be more likely candidates, but that is, of course, speculation from rural Kentucky.

Cynthia Murrell, August 8, 2018

Written by Stephen E. Arnold · Filed Under AI, algorithms, Legal matters, News | 1 Comment

Insurance Risk? Let an Algorithm Decide

August 7, 2018

Perhaps Big Data will save us from the vexing problem of credit reports, in one industry at least. The SmartDataCollective posits, “Is Big Data Causing Insurance Actuaries to Move Away from Using Credit Scores?” For twenty-some-odd years, insurance companies have relied on credit scores to assess risks and set premiums. Whether bad credit really means someone is more likely to, say, get into a car accident is debatable, but no matter. It seems some actuaries now think predictive analytics will provide better gauges, but we suggest that could lead to a larger and more complex can of worms. What data do they consider, and what conclusions do they draw? I doubt we can expect much transparency here.

Writer Annie Qureshi explores why the use of credit scores by insurance agencies is problematic, then describes:

“This is why insurers are using big data to make more nuanced decisions about the credit risks that their customers present. They may find that certain variables that are incorporated into credit scoring algorithms overstate a customer’s dependability. A customer could have a high credit score, because they have made the vast majority of their payments on time over the past seven years and have used little of their debt. However, they may have recently started using or if their credit card debt and missed three of the last seven payments on their existing insurance policy. This could be an indication that they have recently suffered a job loss or other financial setback, which is not reflected in their current credit score. There are other reasons that insurers are skeptical of using credit scores in the age of big data. One analysis shows that big data has helped insurers recognize that credit-based insurance policies are increasing the risk of unjust racial profiling.”

Indeed, but at the moment the data analytics field is suffering its own bias crisis (though a solution may be at hand). It will be interesting to see where this goes. Meanwhile, many of us would do well to be more careful what details we share online, since we cannot be sure how any tidbit may be used against us down the line.

Cynthia Murrell, August 8, 2018

Written by Stephen E. Arnold · Filed Under algorithms, News | 1 Comment

IBM Turns to Examples to Teach AI Ethics

July 31, 2018

It seems that sometimes, as with humans, the best way to teach an AI is by example. That’s one key takeaway from VentureBeat’s article, “IBM Researchers Train Ai to Follow Code of Ethics.” The need to program a code of conduct into AI systems has become clear, but finding a method to do so has proven problematic. Efforts to devise rules and teach them to systems are way too slow, and necessarily leave out many twists and turns of morality that (most) humans understand instinctively. IBM’s solution is to make the machine draw conclusions for itself by studying examples. Writer Ben Dickson specifies:

“The AI recommendation technique uses two different training stages. The first stage happens offline, which means it takes place before the system starts interacting with the end user. During this stage, an arbiter gives the system examples that define the constraints the recommendation engine should abide by. The AI then examines those examples and the data associated with them to create its own ethical rules. As with all machine learning systems, the more examples and the more data you give it, the better it becomes at creating the rules. … The second stage of the training takes place online in direct interaction with the end user. Like a traditional recommendation system, the AI tries to maximize its reward by optimizing its results for the preferences of the user and showing content the user will be more inclined to interact with. Since satisfying the ethical constraints and the user’s preferences can sometimes be conflicting goals, the arbiter can then set a threshold that defines how much priority each of them gets. In the [movie recommendation] demo IBM provided, a slider lets parents choose the balance between the ethical principles and the child’s preferences.”

Were told the team is also working to use more complex systems than the yes/no model, ones based on ranked priorities instead, for example. Dickson notes the technique can be applied to many other purposes, like calculating optimal drug dosages for certain patients in specific environments. It could also, he posits, be applied to problems like filter bubbles and smartphone addiction.

Beyond Search wonders if IBM ethical methods apply to patent enforcement, staff management of those over 55 year old, and unregulated blockchain services. Annoying questions? I hope so.

Cynthia Murrell, July 31, 2018

Written by Stephen E. Arnold · Filed Under algorithms, Business strategy, IBM Watson, News | Comments Off on IBM Turns to Examples to Teach AI Ethics

An Algorithm for Fairness and Bias Checking

July 16, 2018

I like the idea of a meta algorithm. This particular meta algorithm is described in “New Algorithm Limits Bias in Machine Learning.” The write up explains what those not working with smart software have known for—what is it?—decades? A century? Here’s the explanation of what happens when algorithms are slapped together:

But researchers have found that machine learning can produce unfair determinations in certain contexts, such as hiring someone for a job. For example, if the data plugged into the algorithm suggest men are more productive than women, the machine is likely to “learn” that difference and favor male candidates over female ones, missing the bias of the input. And managers may fail to detect the machine’s discrimination, thinking that an automated decision is an inherently neutral one, resulting in unfair hiring practices.

If you want to see how bias works, just run a query for “papa john pizza.” Google dutifully reports via its smart algorithm hits about Papa John’s founder getting evicted from his office, Papa John’s non admission of racial bias, and colleges cut ties to Papa John’s founder.” Google also provides locations and a a link to the a Twitter account. The result displayed for me this morning (July 16, 2018) at 940 am US Eastern was:

The only problem with my query “papa john pizza” is that I wanted the copycat recipe at this link. Google’s algorithm made certain that I would know about the alleged dust up among and within the pizza empire and that I could navigate to a store in Louisville. The smart software made it quite difficult for me to locate the knock off information. Sure, I could have provided Google with more clues to what I wanted like Six Sisters, the word “copycat”, the word “recipe”, and the word “ingredient.” But that’s what smart software is supposed to render obsolete. Boolean has no role in what algorithms expose to users. That’s why results are often interesting. That’s why smart software delivers off kilter results. The intent is to be useful. Often smart software is anything but.

Are the Google results biased? If I were Papa John, it is possible to take umbrage at the three headlines about bias.

Algorithms, if the write up is correct, will ameliorate this type of smart software dysfunctionality.

The article explains:

In a new paper published in the Proceedings of the 35th Conference on Machine Learning, SFI Postdoctoral Fellow Hajime Shimao and Junpei Komiyama, a research associate at the University of Tokyo, offer a way to ensure fairness in machine learning. They’ve devised an algorithm that imposes a fairness constraint that prevents bias.

The developers is quoted as saying:

“So say the credit card approval rate of black and white [customers] cannot differ more than 20 percent. With this kind of constraint, our algorithm can take that and give the best prediction of satisfying the constraint,” Shimao says. “If you want the difference of 20 percent, tell that to our machine, and our machine can satisfy that constraint.”

Just one question: What if a system incorporates two or more fairness algorithms?

Perhaps a meta fairness algorithm will herd the wandering sheep? Georg Cantor was troubled with this infinity of infinities type issues.

Fairness may be in the eye of the beholder. The statue of justice wears a blindfold, not old people magnifiers. Algorithms? You decide. Why not order a pizza or make your own clone of a Papa John pizza if you can find the recipe. Pizza and algorithms to verify algorithms. Sounds tasty.

If I think about algorithms identifying fake news, I may need to order maximum strength Pepcid and receive many, many smart advertisements from Amazon.

Stephen E Arnold, July 16, 2018

Written by Stephen E. Arnold · Filed Under algorithms, News | 2 Comments

Markov: Two Brothers and Chaining Hope to a Single Method for Efficiency

July 4, 2018

I am no math guy. I am no Googler. I am just an old person related to a semi capable math person named V.I. Arnold. That Arnold knew of the Markov guys because those who assisted Kolmogorov sort of kept in touch with stochastic methods.

This is recent news in math history. Andrey Andreyvich Markov died in 1922 when my uncle was a very young math prodigy. His brother Vladimir died in 1897.

Who cares?

I do sort of.

I read “Can Markov Logic Take Machine Learning to the Next Level?” From my point of view, the short answer is, “Not really.”

Machine learning requires a number of numerical recipes. Truth be told, most of these methods have been around a long time. The methods are taught by university profs and even discussed in IBM sales engineers’ briefings. (Yep, at least they were once upon a time.)

The write up explains Pedro Domingos’ insight. The article does not make clear that Dr. Domingos’ work has influenced the Google smart software effort. In fact, Google has, like Amazon, deep affection for the University of Washington. Dr. Jeff Dean, I have heard, shares a warm spot in his heart for the university.

The write up presents some of Dr. Domingos’ insights about Markov and Markov logic.

The key point for me is that as useful as the Russian brothers’ ideas are, there is more to machine learning than a single approach.

In fact, I find this statement from the article interesting:

The productivity advantages of Markov Logic may be too great to ignore. A deep learning machine that takes tens of thousands of lines of code in a traditional language could be expressed with just a few Markov Logic formulas, Domingos says. “It’s not completely push-button. Markov Logic is not at that stage. There’s still the usual playing around with things you have to do,” he says. “But your productivity and how far you can get is just at a different level.”

A few formulas. Interesting idea. How will one explain what comes out of a machine learning process if regulations about transparency for smart software become a reality?

Those who want to understand what smart software does may have to become familiar with the work of the Markov guys. That’s probably unrealistic. Therefore, figuring out how machine intelligence works is likely to be a challenge.

Now let’s get that accuracy of facial recognition systems above the 75 percent level on University of Washington tests.

Stephen E Arnold, July 4, 2018

Written by Stephen E. Arnold · Filed Under AI, algorithms, News | Comments Off on Markov: Two Brothers and Chaining Hope to a Single Method for Efficiency

Is Google Playing Defense?

May 31, 2018

The Search Engine Roundtable reports, “Google Has a Bias Towards Scientific Truth in Search.” Great! Now what about reproducible scientific studies?

This defense of a slant toward verifiable truth was made by Google engineer Paul Haahr on Twitter after someone questioned the impartiality of his company’s “quality raters guidelines,” section 3.2 (reproduced for our convenience in the write-up). The guidelines consider consensus and subject-matter expertise in search rankings, a position one Twitter user took issue with. Writer Barry Schwartz lets that thread speak for itself, so see the write-up for the back-and-forth. The engineer’s challenger basically questions Google’s right to discern good sources from bad (which is, I’d say, is the basic the job of a search engine). This is Haahr’s side:

“We definitely do have a bias towards, for example, what you call ‘Scientific Truth,’ where the guidance in section 3.2 says ‘High quality information pages on scientific topics should represent well established scientific consensus on issues where such consensus exists. […]

‘It’s the decision we’ve made: we need to be able to describe what good search results are. Those decisions are reflected in our product. Ultimately, someone who disagrees with our principles may want a different product; there may be a market niche for them. […]

‘I think it’s the only realistic model if you want to build a search engine. You need to know what your objective in ranking is. Evaluation is central to the whole process and that needs clarity on what “good” means. If you don’t describe it, you only get noise.’”

The write-up concludes with this question from Haahr—if Google’s search results are bad, is it because they are too close to their guidelines, or too far away?

Cynthia Murrell, May 31, 2018

Written by Stephen E. Arnold · Filed Under algorithms, Google, News | Comments Off on Is Google Playing Defense?

Google: Excellence Evolves to Good Enough

May 25, 2018

I read “YouTube’s Infamous Algorithm Is Now Breaking the Subscription Feed.” I assume the write up is accurate. I believe everything I read on the Internet.

The main point of the write up seems to me to be that good enough is the high water mark.

I noted this passage, allegedly output by a real, thinking Googler:

Just to clarify. We are currently experimenting with how to show content in the subs feed. We find that some viewers are able to more easily find the videos they want to watch when we order the subs feed in a personalized order vs always showing most recent video first.

I also found this statement interesting:

With chronological view thrown out, it’s going to become even more difficult to find new videos you haven’t seen — especially if you follow someone who uploads at a regular time each day.

I would like to mention that Google, along wit In-Q-Tel, invested in Recorded Future. That company has some pretty solid date and time stamping capabilities. Furthermore, my hunch is that the founders of the company know the importance of time metadata to some of the Recorded Future customers.

What would happen if Google integrated some of Recorded Future’s time capabilities into YouTube and into good old Google search results.

From my point of view, good enough means “sells ads.” But I am usually incorrect, and I expect to learn just how off base I am when I explain how one eCommerce giant is about to modify the landscape for industrial strength content analysis. Oh, that company’s technology does the date and time metadata pretty well.

More on this mythical “revolution” on June 5th and June 6th. In the meantime, try and find live feeds of the Hawaii volcano event using YouTube search. Helpful, no?

Stephen E Arnold, May 25, 2018

Written by Stephen E. Arnold · Filed Under Advertising, algorithms, Google, Indexing, News | Comments Off on Google: Excellence Evolves to Good Enough

IBM: Just When You Thought Crazy Stuff Was Dwindling

May 19, 2018

How has IBM marketing reacted to the company’s Watson and other assorted technologies? Consider IBM and quantum computing. That’s the next big thing, just as soon as the systems become scalable. And the problem of programming? No big deal. What about applications? Hey, what is this a reality roll call?

Answer: Yes, plus another example of IBM predicting the future.

Navigate to “IBM Warns of Instant Breaking of Encryption by Quantum Computers: ‘Move Your Data Today’.”

I like that “warning.” I like that “instant breaking of encryption.” I like that command: “Move your data today.”

Hogwash.

IBM’s quantum computing can solve encryption problems instantly. Can this technology wash this hog? The answer is that solving encryption instantly and cleaning this dirty beast remain highly improbably. To verify this hunch, let’s ask Watson.

The write up states with considerable aplomb:

“Anyone that wants to make sure that their data is protected for longer than 10 years should move to alternate forms of encryption now,” said Arvind Krishna, director of IBM Research.

So, let me get this straight. Quantum computing can break encryption instantly. I am supposed to move to an alternate form of encryption. But if encryption can be broken instantly, why bother?

That strikes me as a bit of the good old tautological reasoning which leads exactly to nowhere. Perhaps I don’t understand.

I learned:

The IBM Q is an attempt to build a commercial system, and IBM has allowed more than 80,000 developers run applications through a cloud-based interface. Not all types of applications will benefit from quantum computers. The best suited are problems that can be broken up into parallel processes. It requires different coding techniques. “We still don’t know which applications will be best to run on quantum computers,” Krishna said. “We need a lot of new algorithms.”

No kidding. Now we need numerical recipes, and researchers have to figure out what types of problems quantum computing can solve?

We have some dirty hogs in Harrod’s Creek, Kentucky. Perhaps IBM’s quantum cloud computing thing which needs algorithms can earn some extra money. You know that farmers in Kentucky pay pretty well for hog washing.

Stephen E Arnold, May 19, 2018

Written by Stephen E. Arnold · Filed Under algorithms, IBM Watson, News, Technology | Comments Off on IBM: Just When You Thought Crazy Stuff Was Dwindling

Text Classification: Established Methods Deliver Good Enough Results

April 26, 2018

Short honk: If you are a cheerleader for automatic classification of text centric content objects, you are convinced that today’s systems are home run hitters. If you have some doubts, you will want to scan the data in “Machine Learning for Text Categorization: Experiments Using Clustering and Classification.” The paper was free when I checked at 920 am US Eastern time. For the test sets, Latent Dirichlet Allocation performed better than other widely used methods. Worth a look. From my vantage point in Harrod’s Creek, automated processes, regardless of method, perform in a manner one expert explained to me at Cebit several years ago: “Systems are good enough.” Improvements are now incremental but like getting the last few percentage ticks of pollutants from a catalytic converter, an expensive and challenging engineering task.

Stephen E Arnold, April 26, 2018

Written by Stephen E. Arnold · Filed Under algorithms, News, Text processing | 1 Comment

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Mathematical Recipes Revealed: Oh, Oh, Trouble

Can IBM Watermark Neural Networks?

Insurance Risk? Let an Algorithm Decide

IBM Turns to Examples to Teach AI Ethics

An Algorithm for Fairness and Bias Checking

Markov: Two Brothers and Chaining Hope to a Single Method for Efficiency

Is Google Playing Defense?

Google: Excellence Evolves to Good Enough

IBM: Just When You Thought Crazy Stuff Was Dwindling

Text Classification: Established Methods Deliver Good Enough Results

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta