Data Science Book: Free for Now

May 24, 2019

We spotted a post by Capri Granville which points to a free data science book. The post also provides a link to other free books. The Microsoft Research India book is “Foundations of Data Science” by Ravi Kannan. You can as of May 24, 2019, download the book without charge at this link: https://www.cs.cornell.edu/jeh/book.pdf. Cornell charges students about $55,188 for an academic year. DarkCyber believes that “free” may not be an operative word where the Theory Center used to love those big IBM computers. No, they were not painted Azure.

Stephen E Arnold, May 24, 2019

Written by Stephen E. Arnold · Filed Under algorithms, Database | Comments Off on Data Science Book: Free for Now

Predictions and Experts: Maybe Ignore Them or Just Punish Them?

May 13, 2019

I read “The Peculiar Blindness of Experts” with this subtitle:

Credentialed authorities are comically bad at predicting the future. But reliable forecasting is possible.

The write up reminded me of an anthologized essay in freshman English 101. I suggest you take a look at the original. There is a subtext chugging along in this lengthy write up. To whet your appetite, consider this passage which I circled in True Blue marker:

Unfortunately, the world’s most prominent specialists are rarely held accountable for their predictions, so we continue to rely on them even when their track records make clear that we should not.

Is the message “Get it wrong and get punished.” Outputs from Recorded Future or horse race touts could possibly be altered.

There is a bit of hope for those who can learn:

The best forecasters, by contrast, view their own ideas as hypotheses in need of testing. If they make a bet and lose, they embrace the logic of a loss just as they would the reinforcement of a win. This is called, in a word, learning.

Is smart software like a hedgehog or a fox?

I won’t predict your response.

Stephen E Arnold, May 13, 2019

Written by Stephen E. Arnold · Filed Under algorithms, News | Comments Off on Predictions and Experts: Maybe Ignore Them or Just Punish Them?

China: Patent Translation System

May 10, 2019

Patents are usually easily findable documents. However, reading a patent once found is a challenge. Up the ante if the patent is in a language the person does not read. “AI Used to Translate Patent Documents” provides some information about a new system available from the Intellectual Property Publishing House. According to the article in China Daily:

The system can translate Chinese into English, Japanese and German and vice versa. Its accuracy in two-way translation between Chinese and Japanese has reached 95 percent, far more than the current industry average, and the rest has topped 90 percent…

The system uses a dictionary, natural language processing algorithms, and a computational model. In short, this is a collection of widely used methods tuned over a decade by the Chinese organization. In that span, Thomson Reuters dropped out of the patent game, and just finding patents, even in the US, can be a daunting task.

Translation has been an even more difficult task for some lawyers, researchers, analysts, and academics.

If the information in the China Daily article is accurate, China may have an intellectual property advantage., The write up offers some details, which sound interesting; for example:

Translation of a Japanese document: five seconds
Patent documents record 90 percent of a country’s technology and innovation
China has “a huge database of global patents”.

And the other 10 percent? Maybe other methods are employed.

Stephen E Arnold, May 10, 2019

Written by Stephen E. Arnold · Filed Under algorithms, News, Text processing, Translation | Comments Off on China: Patent Translation System

Algorithms: Thresholds and Recycling Partially Explained

April 19, 2019

Five or six years ago I prepared a lecture about the weaknesses in widely used algorithms. In that talk, which I delivered to intelligence operatives in Western Europe and the US, I pointed out two points which were significant to me and my small research team.

There are about nine or 10 algorithms which are used again and again. One example is k means. The reason is that the procedure is a fixture in many university courses, and the method is good enough.
Quite a bit of the work on smart software relies on cutting and pasting. In 1962, I discovered the value of this approach when I worked on a small project at my undergraduate university. Find a code snippet that does the needed task, modify it if necessary, and bingo! Today this approach remains popular.

I thought about my lectures and these two points when I read another part of the mathy series “Untold History of AI: Algorithmic Bias Was Born in the 1980s.” IEEE Spectrum does a reasonable job of explaining one case of algorithmic bias. The story is similar to the experience Amazon had with one of its smart modules. The math produced wonky results. The word “bias” is okay with me, but the outputs from systems which happily chug away and deliver “outputs” to clueless MBAs, lawyers, and marketers may be incorrect.

Several observations:

The bias in methods goes back before I showed up at the university computer center to use the keypunch machines. Way back in fact.
Developers today rely on copy and paste, open source, and the basic methods taught by professors who may be thinking about their side jobs as consultants.
Training data may be skewed, and no one wants to spend the money or take the time to create training data. Why bother? Just use whatever is free, cheap, or already on a storage device. Close enough for horseshoes.
Users do not know [a] what’s going on behind the point and click interfaces, nor do most users care. As a result, a good graphic is “correct.”

The chatter about the one percent focuses on money. There is another, more important one percent in my opinion. The one percent who take the time to look at a sophisticated system will find the same nine or 10 algorithms, the same open source components, and some recycled procedures that few think about. Quick question: How many smart software systems rely on Thomas Bayes’ methods? Give up? Lots.

I don’t have a remedy for this problem, and I am not sure too many people care, want to talk about the “accuracy” of a smart system’s outputs. That’s a happy thought for the weekend. Imagine bad outputs in an autonomous drone or a smart system in a commercial aircraft? Exciting.

Stephen E Arnold, April 19, 2019

Stephen E Arnold,

Written by Stephen E. Arnold · Filed Under algorithms, News | Comments Off on Algorithms: Thresholds and Recycling Partially Explained

Facial Recogntion: An Important Technology Enters Choppy Waters

April 8, 2019

I wouldn’t hold my breath: The Electronic Frontier Foundation (EFF) declares, “Governments Must Face the Facts About Face Surveillance, and Stop Using It.” Writers Hayley Tsukayama and Adam Schwartz begin by acknowledging reality—the face surveillance technology business is booming, with the nation’s law enforcement agencies increasingly adopting it. They write:

EFF supports legislative efforts in Washington and Massachusetts to place a moratorium on government use of face surveillance technology. These bills also would ban a particularly pernicious kind of face surveillance: applying it to footage taken from police body-worn cameras. The moratoriums would stay in place, unless lawmakers determined these technologies do not have a racial disparate impact, after hearing directly from minority communities about the unfair impact face surveillance has on vulnerable people. We recently sent a letter to Washington legislators in support of that state’s moratorium bill.

EFF’s communications may be having some impact.

DarkCyber noted that Amazon will be allowing shareholders a vote about sales of the online bookstore’s facial recognition technology, Rekognition. “AI Researchers Tell Amazon to Stop Selling Facial Recognition to the Police” does not explain how Amazon can remove its FAR from those entities which have licensed the technology.

DarkCyber believes that the US is poised to become a procurement innovation center. Companies and their potential customers have to figure out how to work together without creating political, legal, and financial disruptions.

A failure to resolve what seems to be a more common problem may allow vendors in other countries to capture leading engineers, major contracts, and a lead in an important technology.

Stephen E Arnold, April 8, 2019

Written by Stephen E. Arnold · Filed Under AI, algorithms, Government, News | 1 Comment

Content Management: Now a Playground for Smart Software?

March 28, 2019

CMS or content management systems are a hoot. Sometimes they work; sometimes they don’t. How does one keep these expensive, cranky databases chugging along in the zip zip world of content utilities which are really inexpensive?

Smart software and predictive analytics?

Managing a website is not what is used to be, and one of the biggest changes to content management systems is the use of predictive analytics. The Smart Data Collective discusses “The Fascinating Role of Predictive Analytics in CMS Today.” Reporter Ryan Kh writes:

“Predictive analytics is changing digital marketing and website management. In previous posts, we have discussed the benefits of using predictive analytics to identify the types of customers that are most likely to convert and increase the value of your lead generation strategy. However, there are also a lot of reasons that you can use predictive analytics in other ways. Improving the quality of your website is one of them. One of the main benefits of predictive analytics in 2019 is in improving the performance of content management systems. There are a number of different types of content management systems on the market, including WordPress, Joomla, Drupal, and Shopify. There are actually hundreds of content management systems on the market, but these are some of the most noteworthy. One of the reasons that they are standing out so well against their competitors is that they use big data solutions to get the most value for their customers.”

The author notes two areas in which predictive analytics are helping companies’ bottom lines: fraud detection and, of course, marketing optimization; the latter through capacities like more effective lead generation and content validation.

Yep, CMS with AI. The future with spin.

Cynthia Murrell, March 28, 2019

Written by Stephen E. Arnold · Filed Under algorithms, Database, News | Comments Off on Content Management: Now a Playground for Smart Software?

Good News about Big Data and AI: Not Likely

February 25, 2019

I read a write up which was a bit of a downer. The story appeared in Analytics India and was titled “10 Challenges That Data Science Industry Still Faces.” Oh, oh. Maybe not good news?

My first thought was, “Only 10?”

The write up explains that the number one challenge is humans. The idea that smart software would solve these types of problems: Sluggish workers at fast food restaurants, fascinating decisions made by entry level workers in some government bureaus, and the often remarkable statements offered by talking heads on US cable TV “real news” programs, among others.

Nope. The number one challenge is finding humans who can do data science work.

What’s number two after this somewhat thorny problem? The answer is finding the “right data” and then getting a chunk of data one can actually process.

So one and two are what I would call bedrock issues: Expertise and information.

What about the other eight challenges. Here are three of them. I urge you to read the original article for the other five issues.

Informing people why data science and its related operations are good for you. Is this similar to convincing a three year old that lima beans are just super.
Storytelling. I think this means, “These data mean…” One hopes the humans (who are in short supply) draw the correct inferences. One hopes.
Models. This is a shorthand way of saying, “What’s assembled will work.” Hopefully the answer is, “Sure, our models are great.”

Analytics India has taken a risk with their write up. None of the data science acolytes want to hear “bad news.”

Let’s federate and analyze that with great data we can select to generate a useful output. Maybe 80 percent “accuracy” on a good day?

Stephen E Arnold, February 25, 2019

Written by Stephen E. Arnold · Filed Under algorithms, Analytics, Big data, News | 1 Comment

False Positives: The New Normal

January 1, 2019

And this is why so many people are wary of handing too much power to algorithms. TechDirt reports, “School Security Software Decided Innocent Parent Is Actually a Registered Sex Offender.” That said, it seems some common sense on the part of the humans involved would have prevented the unwarranted humiliation. The mismatch took place at an Aurora, Colorado, middle school event, where parent Larry Mitchell presumably just wanted to support his son. When office staff scanned his license, however, the Raptor system flagged him as a potential offender. Reporter Tim Cushing writes:

“Not only did these stats [exact name and date of birth] not match, but the photos of registered sex offenders with the same name looked nothing like Larry Mitchell. The journalists covering the story ran Mitchell’s info through the same databases — including Mitchell’s birth name (he was adopted) — and found zero matches. What it did find was a 62-year-old white sex offender who also sported the alias ‘Jesus Christ,’ and a black man roughly the same age as the Mitchell, who is white. School administration has little to say about this botched security effort, other than policies and protocols were followed. But if so, school personnel need better training… or maybe at least an eye check. Raptor, which provides the security system used to misidentify Mitchell, says photo-matching is a key step in the vetting process….

We also noted:

“Even if you move past the glaring mismatch in photos (the photos returned in the Sentinel’s search of Raptor’s system are embedded in the article), neither the school nor Raptor can explain how Raptor’s system returned results that can’t be duplicated by journalists.”

This looks like a mobile version of the PEBCAK error, and such mistakes will only increase as these verification systems continue to be implemented at schools and other facilities across the country. Cushing rightly points to this problem as “an indictment of the security-over-sanity thinking.” Raptor, a private company, is happy to tout its great success at keeping registered offenders out of schools, but they do not reveal how often their false positives have ruined an innocent family’s evening, or worse. How much control is our society willing to hand over to AIs (and those who program them)?

Cynthia Murrell, January 1, 2018

Written by Stephen E. Arnold · Filed Under algorithms, News, Statistics | 1 Comment

Will Algorithms Become a Dying Language?

December 30, 2018

It may sound insane, considering how much of our daily life revolves around algorithms. From your work, to your online shopping, to the maps that guide you on vacation, we depend on these codes. However, some engineers fear older algorithms will be lost to the sands of time and future generations will not be able to learn from there. Thankfully, a solution has arrived in the form of The Algorithm Archive.

According to its mission statement:

“The Arcane Algorithm Archive is a collaborative effort to create a guide for all important algorithms in all languages. This goal is obviously too ambitious for a book of any size, but it is a great project to learn from and work on and will hopefully become an incredible resource for programmers in the future.”

A program like this is so important. Maybe the place that has the most to learn from this long evolution of algorithms are those public government agencies. Some writers think many of these agencies have no idea what is in their algorithms, let alone how much they have to do with major policy decisions. Hindsight is truly 20/20.

Patrick Roland, December 30, 2018

Written by Stephen E. Arnold · Filed Under algorithms, News | Comments Off on Will Algorithms Become a Dying Language?

Who Is a Low Risk Hire?

November 21, 2018

Last week, a person who did some contract work for me a year ago asked me if I would provide a reference. I agreed. I assumed that a caring, thoughtful human resources professional would speak with me on the telephone. Wrong. I received a text message asking me if I would complete questions. Get this. Each text message would contain a question about the person who sought a reference. After I hit, send, I would receive another text message.

Wrong.

I was then sent a link to an online form that assured me my information was confidential. “Https” was not part of this outfit’s game plan. I worked through a form, providing scores from one to seven about the person. The fact that I hired this person to perform a specific job for me was evidence that the individual could be trusted. I am not making chopped liver or cranking out greeting cards. We produce training information for law enforcement and intelligence professionals.

I worked through the questions which struck me as worrying more about appearing to be interested in the individual than actually obtaining concrete information about the person. Here’s an example of what the online test reveals:

Yeah, pretty much useless. I am not sure what “adaptability” means. I tell contractors what I want. The successful contractor does that task and gets paid. A contractor who does not gets cut out of the pool. This means in politically incorrect speak: Gets fired.

I read “Public Attitudes Toward Computer Algorithms” a couple of days after going through this odd ball way to get information about a person working on law enforcement and intelligence related work. The write up makes clear that other people are not keen on the use of opaque methods to figure out if a person can do good work and be trusted.

Well, gentle reader, get used to this.

Human resources want to cover their precious mortgage, make a car payment, or buy a new gizmo at the Amazon online store. The HR professionals are not eager to be responsible for screening individuals and figuring out what questions to ask a person like me. For good reason, I am not sure I would spend more than two minutes on the phone with an actual HR person. For the last 30 years, I have worked as an independent consultant. My only interactions with HR are limited to my suggesting that the individual stay away from me. Fill out forms or something. Just leave me alone, or you will be talking to individuals whom I pay to make you go away. I have a Mensa paralegal who can tie almost anyone in knots.

Several observations:

Algorithms for hiring are a big, big thing. Why? Tail covering and document trails that say, “See, I did everything I could required by applicable regulations.” Forget judgment.
The online angle is cheaper than having an actual old fashioned HR department. Outsource benefit reduction. Outsource candidate screening. Heck, outsource the outsourcing.
No one wants to be responsible— for anything. Look at the high school science club management methods at Facebook. The founder is at war. Former employees explain that no one gave direction. Yada yada.
The use of algorithms presumably leads to efficiencies; that is, lower costs, better, faster, cheaper, MBA and bean counter fits of joy.

Just as Apple’s Tim Cook sees nothing objectionable about taking Google’s money as Apple talks up its privacy / security commitment, algorithms make everything — including HR — much better.

Net net: I am glad I am old and officially cranking along at 75, not a hapless 22 year old trying to get a job and do a good job at a zippy de doo dah company.

Stephen E Arnold, November 21, 2018

Written by Stephen E. Arnold · Filed Under algorithms, Business strategy, Financial, Government, News | 2 Comments

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Data Science Book: Free for Now

Predictions and Experts: Maybe Ignore Them or Just Punish Them?

China: Patent Translation System

Algorithms: Thresholds and Recycling Partially Explained

Facial Recogntion: An Important Technology Enters Choppy Waters

Content Management: Now a Playground for Smart Software?

Good News about Big Data and AI: Not Likely

False Positives: The New Normal

Will Algorithms Become a Dying Language?

Who Is a Low Risk Hire?

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta