Boolean Search: Will George Boole Rotate in His Grave?

January 12, 2016

Boolean logic is, for most math wonks, the father of Boolean logic. This is a nifty way to talk about sets and what they contain. One can perform algebra and differential equations whilst pondering George and his method for thinking about fruits when he went shopping.

In the good old days of search, there was one way to search. One used AND, OR, NOT, and maybe a handful of other logic operators to retrieve information from structured indexes and content. Most folks with a library science degree or a friendly math major can explain Boolean reasonably well. Here’s an example which might even work on CSA ProQuest (nèe Lockheed Dialog) even today:

CC=77? AND scam?

The systems when fed the right query would reply with pretty good precision and recall. Precision provided info that was supposed to be useful. Recall meant that what should be included was in the result set.

I thought about Boole, fruit, and logic when I read “The Best Boolean and Semantic Search Tool.” Was I going to read about SDC’s ORBIT, ESA Quest, or (heaven help me) the original Lexis system?

Nope.

I learned about LinkedIn. Not one word about Palantir’s injecting Boolean logic squarely in the middle of its advanced data management processes. Nope.

LinkedIn. I thought that LinkedIn used open source Lucene, but maybe the company has invested in Exorbyte, Funnelback, or some other information access system.

The write up stated:

If you use any source of human capital data to find and recruit people (e.g., your ATS/CRM, resume databases, LinkedIn, Google, Facebook, Github, etc.) and you really want to understand how to best approach your talent sourcing efforts, I recommend watching this video when you have the time.

Okay, human resource functions. LinkedIn, right.

But there is zero content in the write up. I was pointed to a video called “Become a LinkedIn Search Ninja: Advanced Boolean Search” on YouTube.

Here’s what I learned before I killed the one hour video:

  1. The speaker is in charge of personnel and responsible for Big Data activities related to human resources
  2. Search is important to LinkedIn users
  3. Profiles of people are important
  4. Use OR. (I found this suggestion amazing.)
  5. Use iterative, probabilistic, and natural language search, among others. (Yep, that will make sense to personnel professionals.)

Okay. I hit the stop button. Not only will George be rotating, I may have nightmares.

Please, let librarians explicitly trained in online search and retrieval explain methods for obtaining on point results. Failing a friendly librarian, ask someone who has designed a next generation system which provides “helpers” to allow the user to search and get useful outputs.

Entity queries are important. LinkedIn can provide some useful information. The tools to obtain that high value information are a bit more sophisticated than the recommendations in this video.

Stephen E Arnold, January 12, 2016

Search Online Too Long? Tietze Disease Will Get You

January 8, 2016

I read “Technology Addict Develops Tietze Disease from Spending 23 Hours a Day Online.” I know, gentle reader, that using search engines can be frustrating. I know too that most of my readers spend hours upon hours trying to make Bing, Google, and Yandex point to a specific document which will answer your most pressing business question.

The fix is little more than search systems which return relevant results without ads and fluff.

Be aware. If you find yourself investing hours upon hours in crafting queries, you may succumb to “shooting pains” in your “back and chest.” You may have strained your “costal cartridges.”

The culprit Tietze disease.

Rest easy. The problem is benign. Go back to searching. Be tough.

Stephen E Arnold, January 8, 2016

The Long Goodbye of Internet Freedom Heralded by CISA

January 8, 2016

The article on MotherBoard titled Internet Freedom Is Actively Dissolving in America paints a bleak picture of our access to the “open internet.” In spite of the net neutrality win this year, broadband adoption is decreasing, and the number of poor Americans forced to choose between broadband and smartphone internet is on the rise. In addition to these unfortunate trends,

“Congress and President Obama made the Cybersecurity Information Sharing Act a law by including it in a massive budget bill (as an extra gift, Congress stripped away some of the few privacy provisions in what many civil liberties groups are calling a “surveillance bill”)… Finally, the FBI and NSA have taken strong stands against encryption, one of the few ways that activists, journalists, regular citizens, and yes, criminals and terrorists can communicate with each other without the government spying.”

What this means for search and for our access to the Internet in general, is yet to be seen. The effects of security laws and encryption opposition will obviously be far-reaching, but at what point do we stop getting the information that we need to be informed citizens?

And when you search, if it is not findable, does the information exist?

 

Chelsea Kerwin, January 8, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

US Broadband: Good News or Some Obvious News

December 29, 2015

I read “Home Broadband 2015.” The write up is from Pew, a US research outfit. What’s interesting about Pew is that its results are used by some MBAs, former middle school teachers, and unemployed “real” journalists as evidence about the world. You know. If the US is like this, then Sudan is going to be just the same. I find this a somewhat touching approach to the world’s uptake of online connectivity. Life is just easier to manage if the Pew view is the lens through which one perceives behavior.

Now to the research.

The write up reports:

Three notable changes relating to digital access and digital divides are occurring in the realm of personal connectivity, according to new findings from Pew Research Center surveys. First, home broadband adoption seems to have plateaued. It now stands at 67% of Americans, down slightly from 70% in 2013, a small but statistically significant difference which could represent a blip or might be a more prolonged reality. This change moves home broadband adoption to where it was in 2012.

Okay, plateau is a metaphor for “hit a brick wall”. The implications are likely to be important for those not in the top one percent who want to buy a whizzy new iPhone to figure out what gifts are selling. (If it is not on the shelf, isn’t that a clue?)

The write up says:

Second, this downtick in home high-speed adoption has taken place at the same time there has been an increase in “smartphone-only” adults – those who own a smartphone that they can use to access the internet, but do not have traditional broadband service at home.

Maybe this explains the somewhat energetic efforts of outfits like the Alphabet Google thing to find additional sources of revenue. Loon balloons, self driving autos, and solving death come to mind. Nothing will sell like a pill to cure death. The device shift makes it harder to put ads in front of eye balls not interested in viewing the commercial messages. With home or desktop anchor surfing stagnating, the business models have to be tweaked. Pronto.

Also, I noted:

Third, 15% of American adults report they have become “cord cutters” – meaning they have abandoned paid cable or satellite television service.

This datum suggests to me that there may be some revenue pain for the purveyors of cords.

The write up is long. I had to click eight times to read the summary. The post includes many nifty, pale graphics. These are somewhat difficult to read on the mobile devices which the write up explains are the cat’s pajamas with Star Wars’ characters on the synthetic flannel.

I found the information about non broadband users’ perceptions of what’s important about having a zippy Internet connection. The surprise is that in 2015 40 percent of the sample for this question want to use high speed Internet to access government services.

image

Hard to read? Too bad.

This makes sense. Have you, gentle reader, attempted to interact with a US government agency in person? Give it a whirl. The problem is that the US government Web sites are not particularly helpful for many situations. Run a query on USA.gov to see what I mean.

The discussion about cost seems obvious. With the notion of income disparity squeezing air time on TV news from the coverage of the NFL and Lady Gaga, I find the idea that those without resources find broadband too expensive. Okay. Obvious to me, but I think the Pew data make the point. The sky is blue, but wait, let me check a survey to make sure. Those without graduate degrees, jobs or sources of income, and the knowledge required to achieve cash flow are affected by costs. Got it.

The good news is that on page 7 Pew explains its methodology. Most of the hyperbole-infused marketers skip this step. I also found the data table on page 8 of the online report interesting. Let’s have more data tables and less of the dancing around the flat lines and inequality stuff.

Worth reading if one wants some obvious points reiterated. Google and other ad supported services will not work unless the ads flow. That’s the take away for me.

Stephen E Arnold, December 29, 2015

Money Laundering: Digital Currency or Old Fashioned Methods?

November 27, 2015

Online is zeros and ones. I worked for a number of years for a fellow with lots of money who explained, “Money is information.” He was mostly correct. However, in the world of big time money laundering, online does not yet have the NFL lineman muscles to do the entire job of keeping financial transactions secret.

The challenge with digital currencies boils down to a search and retrieval problem. Actionable information is embedded in transaction data. Bad actors may not be Bitcoin fans for certain types of unregulated cash transfer tasks.

Navigate to “‘White Gloves,’ ‘VIP Boxes:’ How It’s Done at China’s Underground Banks,” which does a good job of explaining how more traditional money laundering is handled. Bitcoin is okay for moving assets if one has the time, the operational security, and expertise to make the system work.

For folks with JP Morgan-style funds, something more robust and reliable may be needed. Oh, the ability to keep the activity hard to find, hidden from regulators and tax authorities, and reliable is important.

The article states:

In one case Xinhua highlighted this week, state investigators accused a longtime general manager, surnamed Dai, in a state-owned engineering company, Beijing-based China Harbour Engineering, of helping to move $3 million of corruption-tainted gains via a Chinese underground bank onshore. The underground bank used a technique that regulators called an “audit hedge,” essentially depositing 18 million yuan in Mr. Dai’s onshore account in exchange for an equivalent amount of foreign exchange placed in the underground bank’s offshore account. No money crosses the border physically or electronically, making the transaction almost perfectly undetectable — hence “a hedge against audits.”

Another method is an old fave: Shell accounts. The article stated:

In Ningxia, a small northwestern region home to China’s Hui ethnic minority, criminal gangs in the provincial capital Yinchuan set up 12 trading shells that did nothing but generate false export data as a means to move money in or out of the country under the guise of legitimate corporate payments, according to Xinhua. Companies are allowed to move foreign exchange exceeding China’s $50,000 annual limit for legitimate purposes. Police found that the gangs marked the funds that moved through their shell accounts as “national export incentive awards” obtained from the Yinchuan City Bureau of Finance. Investigators alleged that the gangs used the scheme to defraud the Ningxia government since 2013 of export incentives worth 38.6 million yuan ($6 million). Export scams like these usually facilitate illegally moving funds onshore, rather than offshore. China controls foreign exchange coming onshore just as it does money trying to move offshore. The Ningxia case stemmed from 2013, when China was experiencing a high level of net capital inflows.

When will digital currencies facilitate money laundering on this supersized scale? Not surprisingly, verifiable data about the volume of money laundering via digital currencies is tough to obtain.

I would point out that old fashioned methods still have their use. Investigators, therefore, have to rely on useful software like Maltego and add ins and have the resources to dig out information the old fashioned way. This is not just feet on the street; it is humans pulling information threads.

Stephen E Arnold, November 27, 2015

IBM and Digital Piracy: Just Three Ways?

November 27, 2015

I read “Preventing Digital Piracy: 3 Ways to Use Big Data to Protect Content.” I love making complicated issues really easy. Remember the first version of the spreadsheet? Easy. Just get a terminal, wrangle the team to install LANPAR, and have at it. Easy as 1-2-3, which came after VisiCalc.

Ah, LANPAR. You remember that, right. I have fond memories of Language for Programming Arrays at Random, don’t you? I still think the approach embodied in that software was a heck of a lot more user friendly than filling in tiny rectangular areas with a No. 4 pencil and adding and subtracting columns using an adding machine.

IBM has cracked digital piracy by preventing it. Now I find that notion fascinating. On a recent trip, I noted that stolen software and movies were more difficult to find. However, a question or two of the helpful folks at a computer store in Cape Town revealed a number of tips for snagging digital content. One involved a visit to a storefront in a township. Magic. Downloaded stuff on a USB stick. Cheap, fast, unmonitored.

IBM’s solution involve streaming data. Okay, but maybe streaming for some content is not available; for example, a list of firms identified by an intelligence agency as “up and comers.”

IBM also wants me to build a real time feedback loop. That sounds great, but the angle is not rules. The IBM approach is social media. This fix also involves live streaming. Not too useful when the content is not designed to entertain.

The third step wants me to perform due diligence. I am okay with this, but then what? When I worked at a blue chip consulting firm, the teams provided specific recommendations. The due diligence is useless without informed, affordable options and the resources to implement, maintain, and tune the monitoring activity.

I am not sure what IBM expects me to do with these three steps. My initial reaction is that I would do what charm school at Booz, Allen taught decades ago; that is, figure out the problem, identify the options, and implement the approach that had the highest probability of resolving the issue. The job is not to generalize. Proper scope helps ensure success.

If I wanted to prevent digital privacy, I would look to companies which have sophisticated, automatable methods to identify and remediate issues.

IBM, for example, does not possess the functionality of a company like Terbium Labs. There are other innovators dealing with leaking data. I could use LANPAR to do certain types of spreadsheet work. But why? Forward looking solutions do more than offer trivial 1-2-3.

Stephen E Arnold, November 27, 2015

Insight into Hacking Team

November 25, 2015

Short honk: Curious about the world of exploits available to governments and other authorized entities? You may find “Metadata Investigation: Inside Hacking Team” interesting.” Keep in mind that “metadata” means indexes, entity extraction, and other controlled and uncontrolled data content. The report from Share Lab was online on November 23, 2015, when I last checked the link. I discuss Hacking Team and several other firms in my forthcoming monograph about the Dark Web.

Stephen E Arnold, November 25, 2015

Profile of the Equation Group

November 25, 2015

Short honk: I overlooked a link from one of the goslings from early 2015. The Kaspersky report about the Equation Group triggered some media commentary. The report, quite to my surprise, is still available online (or it was when I verified the link on November 23, 2015). If you are interested in information access using unconventional or at least not Emily Post approved methods, you can download “Equation Group: Questions and Answers”, Version 1.5 from Secure List.

Stephen E Arnold, November 25, 2015

Improper Information Access: A Way to Make Some Money

November 24, 2015

I read “Zerodium Revealed Prices” (original is in Russian). the main point of the write up is that exploits or hacks are available for a price. Some of these are attacks which may not be documented by the white hat folks who monitor the exploit and malware suburbs connected to the information highway.

The paragraph I noted explained what Zerodium will pay for a fresh, juicy exploit.

image

Here’s the explanation. Please, recognize that Russian, unlike one of my relative’s language skills, is not my go to language:

For a remote control access exploit which intercepts the victim’s computer through Safari or Microsoft’s browser company is willing to pay $ 50 000. A more sophisticated “entry point” is considered Chrome: for the attack through Zerodium pays $ 80,000. Zerodium will pay $5,000 for a vulnerability in WordPress, Joomla and Drupal. Breaking the TorBrowser can earn the programmer about $30.000… A remote exploit bypassing the protection Android or Windows Phone, will bring its author a $100,000. A working exploit of iOS will earn the developer $500,000.

Zerodium explains itself this way:

Zerodium is a privately held and venture backed startup, founded by cybersecurity veterans with unparalleled experience in advanced vulnerability research and exploitation. We’ve created
Zerodium to build a global community of talented and independent security researchers working together to provide the most up-to-date source of cybersecurity research and capabilities.

The company’s logo is nifty too:

image

The purple OD emphasizes the zero day angle. Are exploits search and information access? Yep, they can be. Not advocating, just stating a fact.

Stephen E Arnold, November 24, 2015

The Art of Martec Content via a Renaissance Diagram

November 22, 2015

I love diagrams which explain content processing. I am ecstatic when a diagram explains information, artificial intelligence, and so much more. I feel as if I were a person from the Renaissance lowered into Nero’s house to see for the first time the frescos. Revelation. Perhaps this diagram points to a modern day Leonardo.

Navigate to “Marketing Data Technology: Making Sense of the Puzzle.” I admire the notion that marketing technology produces data. I love that tracking stuff, the spyware, the malware, and the rest of the goodies sales professionals use to craft their masterpieces. The idea that the data comprise a puzzle is a stroke of brilliance.

How does one convert data into a sale? Martec, marcom, or some other mar on one’s life?

Here’s the diagram. You can view a larger size at this link:

Marketing Data Technology Map

Notice the “space” is divided into four areas: discover, decide, activate, and automate. Notice that there are many functions in each area; for example, divide includes information delivery, insight real time, and marketing performance. Then notice that the diagram includes a complex diagram with a backbone, a data lake, the Web social media, and acronyms which mean nothing to me. There are like the artistic flourishes on the that hack’s paintings in the Sistine Chapel. The touches delight the eye, but no one cares about the details.

Now, I presume, you know how to make sense of the martec puzzle.

I find this type of diagram entertaining. I am not sure if it is a doodle or the Theory of Relativity for marketing professionals. Check out the original. I am still chuckling.

Stephen E Arnold, November 22, 2015

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta