Thunderstone Rumbles about Webinator

August 13, 2015

There is nothing more frustrating than being unable to locate a specific piece of information on a Web site when you use its search function. Search is supposed to be quick, accurate, and efficient. Even if Google search is employed as a Web site’s search feature, it does not always yield the best results. Thunderstone is a company that specializes in proprietary software application developed specifically for information management, search, retrieval, and filtering.

Thunderstone has a client list that includes, but not limited to, government agencies, Internet developer, corporations, and online service providers. The company’s goal is to deliver “product-oriented R&D within the area of advanced information management and retrieval,” which translates to them wanting to help their clients found information very, very fast and as accurately as possible. It is the premise of most information management companies. On the company blog it was announced that, “Thunderstone Releases Webinator Web Index And Retrieval System Version 13.” Webinator makes it easier to integrate high quality search into a Web site and it has several new appealing features:

“Query Autocomplete, guides your users to the search they want
HTML Highlighting, lets users see the results in the original HTML for better contextual information
Expanded XML/SOAP API allows integration of administrative interface”

We like the HTML highlighting that offers users the ability to backtrack and see a page’s original information source. It is very similar to old-fashioned research: go back to the original source to check a fact’s veracity.

Whitney Grace, August 13, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under algorithms, Google, News, Search, Web Services | Comments Off on Thunderstone Rumbles about Webinator

Google Seeks SEO Pro

August 12, 2015

Well, isn’t this interesting. Search Engine Land tells us that “Google Is Hiring an SEO Manager to Improve its Rankings in Google.” The Goog’s system is so objective, even Google needs a search engine optimization expert! That must be news to certain parties in the European Union.

Reporter Barry Schwartz spotted the relevant job posting at the company’s Careers page. Responsibilities are as one might expect: develop and maintain websites; maintain and develop code that will engage search engines; keep up with the latest in SEO techniques; and work with the sales and development departments to implement SEO best practices. Coordination with the search-algorithm department is not mentioned.

Google still stands as one of the most sought-after employers, so it is no surprise they require a lot of anyone hoping to fill the position. Schwartz notes, though, that link-building experience is not specified. He shares the list of criteria:

“The qualifications include:

*BA/BS degree in Computer Science, Engineering or equivalent practical experience.

*4 years of experience developing websites and applications with SQL, HTML5, and XML.

*2 years of SEO experience.

*Experience with Google App Engine, Google Custom Search, Webmaster Tools and Google Analytics and experience creating and maintaining project schedules using project management systems.

*Experience working with back-end SEO elements such as .htaccess, robots.txt, metadata and site speed optimization to optimize website performance.

*Experience in quantifying marketing impact and SEO performance and strong understanding of technical SEO (sitemaps, crawl budget, canonicalization, etc.).

*Knowledge of one or more of the following: Java, C/C++, or Python.

*Excellent problem solving and analytical skills with the ability to dig extensively into metrics and analytics.”

Lest anyone doubt the existence of such an ironic opportunity, the post reproduces a screenshot of the advertisement, “just in case the job is pulled.”

Cynthia Murrell, August 12, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under algorithms, Google, News, Search, Web Services | Comments Off on Google Seeks SEO Pro

Exclusive Interview: Danny Rogers, Terbium Labs

August 11, 2015

Editor’s note: The full text of the exclusive interview with Dr. Daniel J. Rogers, co-founder of Terbium Labs, is available on the Xenky Cyberwizards Speak Web service at www.xenky.com/terbium-labs. The interview was conducted on August 4, 2015.

Significant innovations in information access, despite the hyperbole of marketing and sales professionals, are relatively infrequent. In an exclusive interview, Danny Rogers, one of the founders of Terbium Labs, has developed a way to flip on the lights to make it easy to locate information hidden in the Dark Web.

Web search has been a one-trick pony since the days of Excite, HotBot, and Lycos. For most people, a mobile device takes cues from the user’s location and click streams and displays answers. Access to digital information requires more than parlor tricks and pay-to-play advertising. A handful of companies are moving beyond commoditized search, and they are opening important new markets such as secret and high value data theft. Terbium Labs can “illuminate the Dark Web.”

In an exclusive interview, Dr. Danny Rogers, one of the founders of Terbium Labs with Michael Moore, explained the company’s ability to change how data breaches are located. He said:

Typically, breaches are discovered by third parties such as journalists or law enforcement. In fact, according to Verizon’s 2014 Data Breach Investigations Report, that was the case in 85% of data breaches. Furthermore, discovery, because it is by accident, often takes months, or may not happen at all when limited personnel resources are already heavily taxed. Estimates put the average breach discovery time between 200 and 230 days, an exceedingly long time for an organization’s data to be out of their control. We hope to change that. By using Matchlight, we bring the breach discovery time down to between 30 seconds and 15 minutes from the time stolen data is posted to the web, alerting our clients immediately and automatically. By dramatically reducing the breach discovery time and bringing that discovery into the organization, we’re able to reduce damages and open up more effective remediation options.

Terbium’s approach, it turns out, can be applied to traditional research into content domains to which most systems are effectively blind. At this time, a very small number of companies are able to index content that is not available to traditional content processing systems. Terbium acquires content from Web sites which require specialized software to access. Terbium’s system then processes the content, converting it into the equivalent of an old-fashioned fingerprint. Real-time pattern matching makes it possible for the company’s system to locate a client’s content, either in textual form, software binaries, or other digital representations.

One of the most significant information access innovations uses systems and methods developed by physicists to deal with the flood of data resulting from research into the behaviors of difficult-to-differentiate sub atomic particles.

One part of the process is for Terbium to acquire (crawl) content and convert it into encrypted 14 byte strings of zeros and ones. A client such as a bank then uses the Terbium content encryption and conversion process to produce representations of the confidential data, computer code, or other data. Terbium’s system, in effect, looks for matching digital fingerprints. The task of locating confidential or proprietary data via traditional means is expensive and often a hit and miss affair.

Terbium Labs changes the rules of the game and in the process has created a way to provide its licensees with anti-fraud and anti-theft measures which are unique. In addition, Terbium’s digital fingerprints make it possible to find, analyze, and make sense of digital information not previously available. The system has applications for the Clear Web, which millions of people access every minute, to the hidden content residing on the so called Dark Web.

Terbium Labs, a start up located in Baltimore, Maryland, has developed technology that makes use of advanced mathematics—what I call numerical recipes—to perform analyses for the purpose of finding connections. The firm’s approach is one that deals with strings of zeros and ones, not the actual words and numbers in a stream of information. By matching these numerical tokens with content such as a data file of classified documents or a record of bank account numbers, Terbium does what strikes many, including myself, as a remarkable achievement.

Terbium’s technology can identify highly probable instances of improper use of classified or confidential information. Terbium can pinpoint where the compromised data reside on either the Clear Web, another network, or on the Dark Web. Terbium then alerts the organization about the compromised data and work with the victim of Internet fraud to resolve the matter in a satisfactory manner.

Terbium’s breakthrough has attracted considerable attention in the cyber security sector, and applications of the firm’s approach are beginning to surface for disciplines from competitive intelligence to health care.

Rogers explained:

We spent a significant amount of time working on both the private data fingerprinting protocol and the infrastructure required to privately index the dark web. We pull in billions of hashes daily, and the systems and technology required to do that in a stable and efficient way are extremely difficult to build. Right now we have over a quarter trillion data fingerprints in our index, and that number is growing by the billions every day.

The idea for the company emerged from a conversation with a colleague who wanted to find out immediately if a high profile client list was ever leaded to the Internet. But, said Rogers, “This individual could not reveal to Terbium the list itself.”

How can an organization locate secret information if that information cannot be provided to a system able to search for the confidential information?

The solution Terbium’s founders developed relies on novel use of encryption techniques, tokenization, Clear and Dark Web content acquisition and processing, and real time pattern matching methods. The interlocking innovations have been patented (US8,997,256), and Terbium is one of the few, perhaps the only company in the world, able to crack open Dark Web content within regulatory and national security constraints.

Rogers said:

I think I have to say that the adversaries are winning right now. Despite billions being spent on information security, breaches are happening every single day. Currently, the best the industry can do is be reactive. The adversaries have the perpetual advantage of surprise and are constantly coming up with new ways to gain access to sensitive data. Additionally, the legal system has a long way to go to catch up with technology. It really is a free-for-all out there, which limits the ability of governments to respond. So right now, the attackers seem to be winning, though we see Terbium and Matchlight as part of the response that turns that tide.

Terbium’s product is Matchlight. According to Rogers:

Matchlight is the world’s first truly private, truly automated data intelligence system. It uses our data fingerprinting technology to build and maintain a private index of the dark web and other sites where stolen information is most often leaked or traded. While the space on the internet that traffics in that sort of activity isn’t intractably large, it’s certainly larger than any human analyst can keep up with. We use large-scale automation and big data technologies to provide early indicators of breach in order to make those analysts’ jobs more efficient. We also employ a unique data fingerprinting technology that allows us to monitor our clients’ information without ever having to see or store their originating data, meaning we don’t increase their attack surface and they don’t have to trust us with their information.

For more information about Terbium, navigate to the company’s Web site. The full text of the interview appears on Stephen E Arnold’s Xenky cyberOSINT Web site at http://bit.ly/1TaiSVN.

Stephen E Arnold, August 11, 2015

Written by Stephen E. Arnold · Filed Under algorithms, Analytics, Dark Web, Interview, News, Search enabled applications | 2 Comments

How to Use Watson

August 7, 2015

While there are many possibilities for cognitive computing, what makes an idea a reality is its feasibility and real life application. The Platform explores “The Real Trouble With Cognitive Computing” and the troubles IBM had (has) trying to figure out what they are going to do with the supercomputer they made. The article explains that before Watson became a Jeopardy celebrity, the IBM folks came up 8,000 potential experiments for Watson to do, but only 20 percent of them.

The range is small due to many factors, including bug testing, gauging progress with fuzzy outputs, playing around with algorithmic interactions, testing in isolation, and more. This leads to the “messy” way to develop the experiments. Ideally, developers would have a big knowledge model and be able to query it, but that option does not exist. The messy way involves keeping data sources intact, natural language processing, machine learning, and knowledge representation, and then distributed on an infrastructure.

Here is another key point that makes clear sense:

“The big issue with the Watson development cycle too is that teams are not just solving problems for one particular area. Rather, they have to create generalizable applications, which means what might be good for healthcare, for instance, might not be a good fit—and in fact even be damaging to—an area like financial services. The push and pull and tradeoff of the development cycle is therefore always hindered by this—and is the key barrier for companies any smaller than an IBM, Google, Microsoft, and other giants.”

This is exactly correct! Engineering is not the same as healthcare and it not all computer algorithms transfer over to different industries. One thing to keep in mind is that you can apply different methods from other industries and come up with new methods or solutions.

Whitney Grace, August 7, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under algorithms, Business intelligence, Data, Google, Microsoft, News, Technology | Comments Off on How to Use Watson

The Race to Predict Began Years Ago: Journalism as Paleontology

August 4, 2015

I love reading the dead tree edition of the Wall Street Journal. This morning I learned that “Apple and Google Race to Predict What You Want.” The print story appears in the Business & Tech section on B1 and B6 for August 4, 2014. Note that the online version of the story has this title: “Apple and Google Know What You Want before You Do.” There is a difference for me between a “race” and “know.”

Nevertheless, the write up is interesting because of what is omitted. The story seems to fixate on mobile phone users and the notion of an assistant. The first thing I do with my mobile phone is find a way to disable this stuff. I dumped my test Microsoft phone because the stupid Cortana button was in a location which I inadvertently pressed. The Blackberry Classic is equally annoying, defaulting to a screen which takes three presses to escape. The iPhones and Android devices cannot understand my verbal instructions. Try looking up a Russian or Spanish name. Let me know how that works for you.

Now what’s omitted from the write up. Three points struck me as one which warranted a mention:

Predictive methods are helping in reduce latency and unnecessary traffic (hence cost) between the user’s device and the service with the “answer”
Advertisers benefit from predictive analytics. Figuring out that someone wants food opens the door to a special offer. Why not cue that up in advance?
Predictive technology is not limited to a mobile applications. Google invested some bucks into an outfit called Recorded Future. What does Recorded Future do? Answer: Predictive analytics with a focus on time. The GOOG like Apple is mostly time blind.

Predictive methods are not brand, spanking new to those who have followed the antics of physicists since Einstein miracle year. For the WSJ and its canines, isn’t new whatever today seems bright and shiny.

Stephen E Arnold, August 4, 2015

Written by Stephen E. Arnold · Filed Under Advertising, algorithms, Financial, News | Comments Off on The Race to Predict Began Years Ago: Journalism as Paleontology

Online Ads Discriminate

August 3, 2015

In our modern age, discrimination is supposed to be a thing of the past. When it does appear, people take to the Internet to vent their rage and frustrations, eager to point out this illegal activity. Online ads, however, lack human intelligence and are only as smart as their programmed algorithm. Technology Review explains in “Probing The Dark Side of Google’s Ad-Targeting System” that Google’s ad service makes inaccurate decisions when it comes to gender and other personal information.

A research team at Carnegie Mellon University and the International Computer Science Institute built AdFisher, a tool to track targeted third party ads on Google. AdFisher found that ads were discriminating against female users. Google offers a transparency tool that allows users to select what types of ads appear on their browsers, but even if you use the tool it doesn’t stop some of your personal information from being used.

“What exactly caused those specific patterns is unclear, because Google’s ad-serving system is very complex. Google uses its data to target ads, but ad buyers can make some decisions about demographics of interest and can also use their own data sources on people’s online activity to do additional targeting for certain kinds of ads. Nor do the examples breach any specific privacy rules—although Google policy forbids targeting on the basis of “health conditions.” Still, says Anupam Datta, an associate professor at Carnegie Mellon University who helped develop AdFisher, they show the need for tools that uncover how online ad companies differentiate between people.”

The transparency tool only controls some of the ads and third parties can use their own tools to extract data. Google stands by its transparency tool and even offers users the option to opt-out of ads. Google is studying AdFisher’s results and seeing what the implications are.

The study shows that personal data spills out on the Internet every time we click a link or use a browser. It is frightening how the data can be used and even hurtful if interpreted incorrectly by ads. The bigger question is not how retailers and Google uses the data, but how do government agencies and other institutes plan to use it?

Whitney Grace, August 3, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under algorithms, Google, Marketing, News, Technology | Comments Off on Online Ads Discriminate

Monkeys Cause System Failure

July 28, 2015

Nobody likes to talk about his or her failures. Admitting to failure proves that you failed at a task in the past and it is a big blow to the ego. Failure admission is even worse for technology companies, because users want to believe technology is flawless. On Microsoft’s Azure Blog, Heather Nakama posted “Inside Azure Search: Chaos Engineering” and she explains that software engineers are aware that failure is unavoidable. Rather than trying to prevent failure, they welcome potential failure. Why? It allows them to test software and systems to prevent problems before they develop.

Nakama mentions it is not a sustainable model to account for every potential failure and to test the system every time it is upgraded. Azure Search borrowed chaos engineering from Netflix to resolve the issue and it is run by a bunch of digital monkeys

“As coined by Netflix in a recent excellent blog post, chaos engineering is the practice of building infrastructure to enable controlled automated fault injection into a distributed system. To accomplish this, Netflix has created the Netflix Simian Army with a collection of tools (dubbed “monkeys”) that inject failures into customer services.”

Netflix basically unleashes a Search Chaos Monkey into its system to wreck havoc, then Netflix learns about system weaknesses and repairs accordingly. There are several chaos levels: high, medium, and low, with each resulting in more possible damage. At each level, Search Chaos Monkey is given more destructive tools to “play” around with. The high levels are the most valuable to software engineers, because it demonstrates the largest and worst diagnostic failures.

While letting a bull loose in a china shop is bad, because you lose your merchandise, letting a bunch of digital monkeys loose in a computer system is actually beneficial. It remains true that you can learn from failure. I just hope that the digital monkeys do not have digital dung.

Whitney Grace, July 28, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under algorithms, Business intelligence, Consumer, Microsoft, News, Technology | Comments Off on Monkeys Cause System Failure

Googles Chauvinistic Job Advertising Delivery

July 28, 2015

I thought we were working to get more women into the tech industry, not fewer. That’s why it was so disappointing to read, “Google Found to Specifically Target Men Over Women When It Comes to High-Paid Job Adverts” at IBTimes. It was a tool dubbed AdFisher, developed by some curious folks at Carnegie Mellon and the International Computer Science Institute, that confirmed the disparity. Knowing that internet-usage tracking determines what ads each of us sees, the researchers wondered whether such “tailored ad experiences” were limiting employment opportunities for half the population. Reporter Alistair Charlton writes:

“AdFisher works by acting as thousands of web users, each taking a carefully chosen route across the internet in such a way that an ad-targeting network like Google Ads will infer certain interests and characteristics from them. The programme then records which adverts are displayed when it later visits a news website that uses Google’s ad network. It can be set to act as a man or woman, then flag any differences in the adverts it is shown.

“Anupam Datta, an associate professor at Carnegie Mellon University, said in the MIT Technology Review: ‘I think our findings suggest that there are parts of the ad ecosystem where kinds of discrimination are beginning to emerge and there is a lack of transparency. This is concerning from a societal standpoint.’”

Indeed it is, good sir. The team has now turned AdFisher’s attention to Microsoft’s Bing; will that search platform prove to be just as chauvinistic? For Google’s part, they say they’re looking into the study’s methodology to “understand its findings.” It remains to be seen what sort of parent the search giant will be; will it simply defend its algorithmic offspring, or demand it mend its ways?

Cynthia Murrell, July 28, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under algorithms, Google, Microsoft, News, Search, Technology | Comments Off on Googles Chauvinistic Job Advertising Delivery

Unemployed in Search or Content Processing? Go for Data Science

July 27, 2015

I read an amazing write up. The title of this gem of high school counseling is “7 Skills/Attitudes to Become a Better Data Scientist.” What does one need to be a better data scientist? Better python or R programming methods? Sharper mathematical intuition? Ability to do the least upper bound (sup) and greatest lower bound (inf) of a set of real numbers) in your head, without paper, and none of that Mathematica software? Wrong.

What you need is to be intellectually curious, an understanding of business, ability to communicate (none of the Cool Hand Luke pithiness), knowledge of more than one programming language, knowledge of SQL, be a participant in competitions, and read articles like “7 Skills and Attitudes.”

Yep, follow these tips and you too can be a really capable data scientist. Why wait? Act now. Read the “7 Skills” article. Nah, don’t worry about such silly notions as data integrity or statistical procedures. Talk to someone, anyone and you will be 14.28 percent of the way to your goal.

Stephen E Arnold, July 27, 2015

Written by Stephen E. Arnold · Filed Under algorithms, Analytics, News | Comments Off on Unemployed in Search or Content Processing? Go for Data Science

The Skin Search

July 15, 2015

We reported on how billboards in Russia were getting smarter by using facial recognition software to hide ads advertising illegal products when they recognized police walking by. Now the US government might be working on technology that can identify patterns on tattoos, reports Quartz in, “The US Government Wants Software That Can Detect And Interpret Your Tattoos.”

The Department of Justice, Department of Defense, and the FBI sponsored a competition that the National Institute of Standards and Technology (NIST) recently held on June 8 to research ways to identify ink:

“The six teams that entered the competition—from universities, government entities, and consulting firms—had to develop an algorithm that would be able to detect whether an image had a tattoo in it, compare similarities in multiple tattoos, and compare sketches with photographs of tattoos. Some of the things the National Institute of Standards and Technology (NIST), the competition’s organizers, were looking to interpret in images of tattoos include swastikas, snakes, drags, guns, unicorns, knights, and witches.”

The idea is to use visual technology to track tattoos among crime suspects and relational patterns. Vision technology, however, is still being perfected. Companies like Google and major universities are researching ways to make headway in the technology.

While the visual technology can be used to track suspected criminals, it can also be used for other purposes. One implication is responding to accidents as they happen instead of recording them. Tattoo recognition is the perfect place to start given the inked variety available and correlation to gangs and crime. The question remains, what will they call the new technology, skin search?

Whitney Grace, July 15, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under algorithms, Database, News, Security, Technology | Comments Off on The Skin Search

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Thunderstone Rumbles about Webinator

Google Seeks SEO Pro

Exclusive Interview: Danny Rogers, Terbium Labs

How to Use Watson

The Race to Predict Began Years Ago: Journalism as Paleontology

Online Ads Discriminate

Monkeys Cause System Failure

Googles Chauvinistic Job Advertising Delivery

Unemployed in Search or Content Processing? Go for Data Science

The Skin Search

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta