Inferences: Check Before You Assume the Outputs Are Accurate

November 23, 2015

Predictive software works really well as long as the software does not have to deal with horse races, the stock market, and the actions of single person and his closest pals.

“Inferences from Backtest Results Are False Until Proven True” offers a useful reminder to those who want to depend on algorithms someone else set up. The notion is helpful when the data processed are unchecked, unfamiliar, or just assumed to be spot on.

The write up says:

the primary task of quantitative traders should be to prove specific backtest results worthless, rather than proving them useful.

What throws backtests off the track? The write up provides a useful list of reminders:

Data-mining and data snooping bias
Use of non tradable instruments
Unrealistic accounting of frictional effects
Use of the market close to enter positions instead of the more realistic open
Use of dubious risk and money management methods
Lack of effect on actual prices

The author is concerned about financial applications, but the advice may be helpful to those who just want to click a link, output a visualization, and assume the big spikes are really important to the decision you will influence in one hour.

One point I highlighted was:

Widely used strategies lose any edge they might have had in the past.

Degradation occurs just like the statistical drift in Bayesian based systems. Exciting if you make decisions on outputs known to be flawed. How is that automatic indexing, business intelligence, and predictive analytics systems working?

Stephen E Arnold, November 23, 2015

Written by Stephen E. Arnold · Filed Under algorithms, Analytics, News, Text analytics | Comments Off on Inferences: Check Before You Assume the Outputs Are Accurate

Algorithmic Bias and the Unintentional Discrimination in the Results

October 21, 2015

The article titled When Big Data Becomes Bad Data on Tech In America discusses the legal ramifications of relying on algorithms for companies. The “disparate impact” theory has been used in the courtroom for some time to ensure that discriminatory policies be struck down whether they were created with the intention to discriminate or not. Algorithmic bias occurs all the time, and according to the spirit of the law, it discriminates although unintentionally. The article states,

“It’s troubling enough when Flickr’s auto-tagging of online photos label pictures of black men as “animal” or “ape,” or when researchers determine that Google search results for black-sounding names are more likely to be accompanied by ads about criminal activity than search results for white-sounding names. But what about when big data is used to determine a person’s credit score, ability to get hired, or even the length of a prison sentence?”

The article also reminds us that data can often be a reflection of “historical or institutional discrimination.” The only thing that matters is whether the results are biased. This is where the question of human bias becomes irrelevant. There are legal scholars and researchers arguing on behalf of ethical machine learning design that roots out algorithmic bias. Stronger regulations and better oversight of the algorithms themselves might be the only way to prevent time in court.

Chelsea Kerwin, October 21, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under algorithms, Big data, Data, Google, Image search, News, Search, Search quality | Comments Off on Algorithmic Bias and the Unintentional Discrimination in the Results

The State Department Delves into Social Media

October 13, 2015

People and companies that want to increase a form of communication between people create social media platforms. Facebook was invented to take advantage of the digital real-time environment to keep people in contact and form a web of contacts. Twitter was founded for a more quick and instantaneous form of communication based on short one hundred forty character blurbs. Instagram shares pictures and Pinterest connects ideas via pictures and related topics. Using analytics, the social media companies and other organizations collect data on users and use that information to sell products and services as well as understanding the types of users on each platform.

Social media contains a variety of data that can benefit not only private companies, but the government agencies as well. According to GCN, the “State Starts Development On Social Media And Analytics Platform” to collaborate and contribute in real-time to schedule and publish across many social media platforms and it will also be mobile-enabled. The platform will also be used to track analytics on social media:

“For analytics, the system will analyze sentiment, track trending social media topics, aggregate location and demographic information, rank of top multimedia content, identify influencers on social media and produce automated and customizable reports.”

The platform will support twenty users and track thirty million mentions each year. The purpose behind the social media and analytics platform is still vague, but the federal government has proven to be behind in understanding and development of modern technology. This appears to be a step forward to upgrade itself, so it does not get left behind. But a social media platform that analyzes data should have been implemented years ago at the start of this big data phenomenon.

Whitney Grace, October 13, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under algorithms, Analytics, Government, Mobile, News, Social Media, Twitter | Comments Off on The State Department Delves into Social Media

Business Intelligence and Data Science: There Is a Difference

October 6, 2015

An article at the SmartDataCollective, “The Difference Between Business Intelligence and Real Data Science,” aims to help companies avoid a common pitfall. Writer Brigg Patton explains:

“To gain a competitive business advantage, companies have started combining and transforming data, which forms part of the real data science. At the same time, they are also carrying out Business Intelligence (BI) activities, such as creating charts, reports or graphs and using the data. Although there are great differences between the two sets of activities, they are equally important and complement each other well.

“For executing the BI functions and data science activities, most companies have professionally dedicated BI analysts as well as data scientists. However, it is here that companies often confuse the two without realizing that these two roles require different expertise. It is unfair to expect a BI analyst to be able to make accurate forecasts for the business. It could even spell disaster for any business. By studying the major differences between BI and real data science, you can choose the right candidate for the right tasks in your enterprise.”

So fund both, gentle reader. Patton distinguishes each position’s area of focus, the different ways they use and look at data, and their sources, migration needs, and job processes. If need to hire someone to perform these jobs, check out this handy clarification before you write up those job descriptions.

Cynthia Murrell, October 6, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under algorithms, Analytics, Big data, Business intelligence, Management, News | 2 Comments

Computers Learn Discrimination from Their Programmers

September 14, 2015

One of the greatest lessons one take learn from the Broadway classic South Pacific is that children aren’t born racist, rather they learn about racism from their parents and other adults. Computers are supposed to be infallible, objective machines, but according to Gizmodo’s article, “Computer Programs Can Be As Biased As Humans.” In this case, computers are “children” and they observe discriminatory behavior from their programmers.

As an example, the article explains how companies use job application software to sift through prospective employees’ resumes. Algorithms are used to search for keywords related to experience and skills with the goal of being unbiased related to sex and ethnicity. The algorithms could also be used to sift out resumes that contain certain phrases and other information.

“Recently, there’s been discussion of whether these selection algorithms might be learning how to be biased. Many of the programs used to screen job applications are what computer scientists call machine-learning algorithms, which are good at detecting and learning patterns of behavior. Amazon uses machine-learning algorithms to learn your shopping habits and recommend products; Netflix uses them, too.”

The machine learning algorithms are mimicking the same discrimination habits of humans. To catch these computer generated biases, other machine learning algorithms are being implemented to keep the other algorithms in check. Another option to avoid the biases is to reload the data in a different manner so the algorithms do not fall into the old habits. From a practical stand point it makes sense: if something does not work the first few times, change the way it is done.

Whitney Grace, September 14, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under algorithms, Analytics, Applications, Big data, News | Comments Off on Computers Learn Discrimination from Their Programmers

The AI Evolution

September 10, 2015

An article at WT Vox announces, “Google Is Working on a New Type of Algorithm Called ‘Thought Vectors’.” It sounds like a good use for a baseball cap with electrodes, a battery pack, WiFi, and a person who thinks great thoughts. In actuality, it’s a project based on the work of esteemed computer scientist Geoffrey E. Hinton, who has been exploring the idea of neural networks for decades. Hinton is now working with Google to create the sophisticated algorithm of our dreams (or nightmares, depending on one’s perspective).

Existing language processing software has come a very long way; Google Translate, for example, searches dictionaries and previously translated docs to translate phrases. The app usually does a passably good job of giving one the gist of a source document, but results are far from reliably accurate (and are often grammatically comical.) Thought vectors, on the other hand, will allow software to extract meanings, not just correlations, from text.

Continuing to use translation software as the example, reporter Aiden Russell writes:

“The technique works by ascribing each word a set of numbers (or vector) that define its position in a theoretical ‘meaning space’ or cloud. A sentence can be looked at as a path between these words, which can in turn be distilled down to its own set of numbers, or thought vector….

“The key is working out which numbers to assign each word in a language – this is where deep learning comes in. Initially the positions of words within each cloud are ordered at random and the translation algorithm begins training on a dataset of translated sentences. At first the translations it produces are nonsense, but a feedback loop provides an error signal that allows the position of each word to be refined until eventually the positions of words in the cloud captures the way humans use them – effectively a map of their meanings.”

But, won’t all efficient machine learning lead to a killer-robot-ruled dystopia? Hinton bats away that claim as a distraction; he’s actually more concerned about the ways big data is already being (mis)used by intelligence agencies. The man has a point.

Cynthia Murrell, September 10, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under AI, algorithms, Data, Google, News, Search quality | Comments Off on The AI Evolution

Algorithms Are Objective As Long As You Write Them

September 8, 2015

I read “Big Data’s Neutral Algorithms Could Discriminate against Most Vulnerable.” Ridiculous. Objective procedures cannot discriminate. The numerical recipes do what they do.

Ah, but when a human weaves together methods and look up tables, sets thresholds, and uses Bayesian judgments, well, maybe a little bit of bias can be baked in.

The write up reports:

So how will the courts address algorithmic bias? From retail to real estate, from employment to criminal justice, the use of data mining, scoring software and predictive analytics programs is proliferating at an exponential rate. Software that makes decisions based on data like a person’s ZIP code can reflect, or even amplify, the results of historical or institutional discrimination.”[A]n algorithm is only as good as the data it works with,” Solon Barocas and Andrew Selbst write in their article “Big Data’s Disparate Impact,” forthcoming in the California Law Review. “Even in situations where data miners are extremely careful, they can still affect discriminatory results with models that, quite unintentionally, pick out proxy variables for protected classes.”

And I liked this follow on:

It’s troubling enough when Flickr’s auto-tagging of online photos label pictures of black men as “animal” or “ape,” or when researchers determine that Google search results for black-sounding names are more likely to be accompanied by ads about criminal activity than search results for white-sounding names. But what about when big data is used to determine a person’s credit score, ability to get hired, or even the length of a prison sentence?

Shift gears. Navigate to “Microsoft Is Trying to Stop Users from Downloading Chrome or Firefox.” Objective, right?

Two thoughts. The math oriented legal eagles will sort this out. Lawyers are really good at math. Also, write your own algorithm and tune it to deliver what you want. No bias there. You are expressing your inner self.

It’s just a process and billable.

Stephen E Arnold, September 8, 2015

Written by Stephen E. Arnold · Filed Under algorithms, News | Comments Off on Algorithms Are Objective As Long As You Write Them

Algorithms Still Need Oversight

September 8, 2015

Many have pondered what might happen when artificial intelligence systems go off the rails. While not spectacular enough for Hollywood, some very real consequences have been observed; the BBC examines “The Bad Things that Happen When Algorithms Run Online Shops.”

The article begins by relating the tragic tale of an online T-shirt vendor who just wanted to capitalize on the “Keep Calm and Carry On” trend. He set up an algorithm to place random terms into the second half of that oft-copied phrase and generate suggested products. Unfortunately, the list of phrases was not sufficiently vetted, resulting in a truly regrettable slogan virtually printed on virtual examples. Despite the fact that the phrase appeared only on the website, not on any actual shirts, the business never recovered its reputation and closed shortly thereafter. Reporter Chris Baranuik writes:

“But that’s the trouble with algorithms. All sorts of unexpected results can occur. Sometimes these are costly, but in other cases they have benefited businesses to the tune of millions of pounds. What’s the real impact of the machinations of machines? And what else do they do?”

Well, one other thing is to control prices. Baranuik reports that software designed to set online prices competitively, based on what other sites are doing, can cause prices to fluctuate day-to-day, sometimes hour-to-hour. Without human oversight, results can quickly become extreme to either end of the scale. For example, for a short time last December, prices of thousands of products sold through Amazon were set to just one penny each. Amazon itself probably weathered the unintended near-giveaways just fine, but smaller merchants selling through the site were not so well-positioned; some closed as a direct result of the error. On the other hand, vendors trying to keep their prices as high as feasible can make the opposite mistake; the article points to the time a blogger found an out-of-print textbook about flies priced at more than $23 million, the result of two sellers’ dueling algorithms.

Such observations clearly mean that consumers should be very wary about online prices. The bigger takeaway, though, is that we’re far from ready to hand algorithms the reigns of our world without sufficient human oversight. Not yet.

Cynthia Murrell, September 8, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under AI, algorithms, Enterprise, Marketing, News | Comments Off on Algorithms Still Need Oversight

Neural Nets: A Categorical Affirmative Means No Exceptions

August 21, 2015

The opening sentence of “A Visual Proof That Neural Nets Can Compute Any Function” is a Duesy. Here it is:

One of the most striking facts about neural networks is that they can compute any function at all.

I am okay with neural nets, but I struggle against statements which assert universalities like “any function at all.”

There are fancy terms for this type of error; for example, formal fallacy.

What’s troubling is that the use of “all,” “every,” “any,” and other umbrella terms seem to be more common than Ashley Madison customer names.

Textbooks which “teach” that something works for “any function” accelerate generalization as a standard operating procedure.

Untidy? Yep.

Stephen E Arnold, August 21, 2015

Written by Stephen E. Arnold · Filed Under algorithms, News | Comments Off on Neural Nets: A Categorical Affirmative Means No Exceptions

How to Use Watson

August 17, 2015

While there are many possibilities for cognitive computing, what makes an idea a reality is its feasibility and real life application. The Platform explores “The Real Trouble With Cognitive Computing” and the troubles IBM had (has) trying to figure out what they are going to do with the supercomputer they made. The article explains that before Watson became a Jeopardy celebrity, the IBM folks came up 8,000 potential experiments for Watson to do, but only 20 percent of them.

The range is small due to many factors, including bug testing, gauging progress with fuzzy outputs, playing around with algorithmic interactions, testing in isolation, and more. This leads to the “messy” way to develop the experiments. Ideally, developers would have a big knowledge model and be able to query it, but that option does not exist. The messy way involves keeping data sources intact, natural language processing, machine learning, and knowledge representation, and then distributed on an infrastructure.

Here is another key point that makes clear sense:

“The big issue with the Watson development cycle too is that teams are not just solving problems for one particular area. Rather, they have to create generalizable applications, which means what might be good for healthcare, for instance, might not be a good fit—and in fact even be damaging to—an area like financial services. The push and pull and tradeoff of the development cycle is therefore always hindered by this—and is the key barrier for companies any smaller than an IBM, Google, Microsoft, and other giants.”

This is exactly correct! Engineering is not the same as healthcare and it not all computer algorithms transfer over to different industries. One thing to keep in mind is that you can apply different methods from other industries and come up with new methods or solutions.

Whitney Grace, August 18, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under algorithms, Cloud computing, Data, Google, Microsoft, News, Search | Comments Off on How to Use Watson

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Inferences: Check Before You Assume the Outputs Are Accurate

Algorithmic Bias and the Unintentional Discrimination in the Results

The State Department Delves into Social Media

Business Intelligence and Data Science: There Is a Difference

Computers Learn Discrimination from Their Programmers

The AI Evolution

Algorithms Are Objective As Long As You Write Them

Algorithms Still Need Oversight

Neural Nets: A Categorical Affirmative Means No Exceptions

How to Use Watson

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta