Power Search for Open Source Developers

September 6, 2013

Open source is cutting across the world as solution revolution. It is making technology cheaper and more widely available. It could have positive far reaching consequences in education and aerospace technology, but all revolutions need a little help getting off the ground.

Open source projects need all the help they can get. If not with funding, then with volunteers contributing to open source programming and free tools they can brandish. Search engines tuned with algorithms to find source code for programming projects are among the tools for the kit bag. While reusing code is a much debated topic in higher circles, they could be of help to beginner programmers and those trying to work their way through a coding logjam by cross-referencing their code.”

Makeuseof.com points to the article, “Open Source Matters: 6 Source Code Search Engines You Can Use For Programming Projects” that lists code search engines to help developers out in their projects. Ohloh Code is one of the largest code search engines with over ten billion code lines in its system. It allows users to search by different code classes, but currently it does not support regular expressions. SearchCode searches through open source communities such as Github, SourceForge, and CodePlex. Amazingly, a single person maintains it. For those who have code with special symbols, Google and other engines cannot cut it. That is where Symbol Hound sniffs around the Net for odd character.

There are a few more code search engines described in the article, but head on over to read it on your own. Code search engines are indicative of the open source mentality-share and spread the wealth.

Whitney Grace, September 06, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

The New Yahoo: A Logo and Pressure on an Executive

September 5, 2013

Yahooooo. I read two stories about the grandma of Web sites. The first was “Introducing Our New Logo!.” I like the exclamation point. The logo is okay, but it seems to be cosmetic. When I was in Portugal in August, Yahoo would not render 70 percent of the time. Why? I am no rocket scientist, so I suppose I could blame it on the hapless Portuguese connectivity providers. But Gmail worked about 90 percent of the time, so maybe the problem is Yahoo’s. Will a new logo address the time outs? One hopes.

Then I read “Pressure Mounts on Yahoo’s De Castro.” No exclamation point after Yahoo, however. The main point of the write up in my opinion was:

Sources close to Yahoo say that De Castro is feeling increasing pressure to deliver better ad results, as the blustery exec has found himself on the outs with CEO Marissa Mayer. There even has been talk that De Castro could be gone by the end of the year, according to numerous sources. The big knock against De Castro is, despite Mayer’s string of mobile acquisitions, lots of positive press and the massive Tumblr deal, the companys ad business has languished in a marketplace that is enjoying robust growth. Particularly alarming is that Yahoo’s display business is getting hit on both the branding front and programmatic, which would theoretically be a De Castro strength, given his Google background.

My thought is that a new logo and creating discomfort for senior managers adds a different octave to the Yahoo yodel. Do I hear a screech? No, no. The sound is what I hear when one of the goslings tries to:

  1. Figure out which page will display when accessing Yahoo.com
  2. Looking at search results which have modest relevance to the query
  3. Scanning a shopping search result.

I hope that the new logo and excellent management will make the Yahoo yodel more melodious for the fellow in Big Bear, California.

Stephen E Arnold, September 5, 2013

Sponsored by Xenky. Oh, wait. I am Xenky.

History of Web Indexing: BBC Style

September 4, 2013

I read “Jonathon Fletcher: Forgotten Father of the Search Engine.” I have no quibble with the claims that the first Web crawler was an invention spawned in the United Kingdom.

I did find several interesting factoids in the write up; for example:

  1. Google has indexed more than one trillion pages. On the surface, this sounds just super. However, what is the cost of maintaining the index of the alleged one trillion pages? Is Google cutting corners in its indexing to reduce costs? Perhaps the BBC will expand on this statement. A trillion is a big number and I wonder what percentage of those “pages” are indexed on a daily basis to keep the index fresh.
  2. “Because websites were added to the list manually, there was nothing to track changes to their content. Consequently, many of the links were quickly out-of-date or wrongly labeled.” Is this true today?
  3. “By June of 1994, JumpStation had indexed 275,000 pages. Space constraints forced Mr Fletcher to only index titles and headers of web pages, and not the entire content of the page, but even with this compromise, JumpStation started to struggle under the load.” Decades ago the black hole of Web indexing was visible. Now that Big Data have arrived, won’t indexing costs rise in lock step? What cost savings are available? Perhaps indexing less content and changing the index refresh cycles are expedient actions? Have Bing, Google, and Yandex gone down this path? Perhaps the BBC will follow up on this issue?
  4. “But in my [Fletcher’s] opinion, the Web isn’t going to last forever. But the problem of finding information is.” Has progress been made in Web search?

One interesting aspect of the write up is the conflation of Web search with other types of search. The confusion persists I believe.

Perhaps the BBC will look into the contributions to search of Dr. Martin Porter, the inventor of the Porter Stemmer. Dr. Porter’s Muscat search technology was important, arguably more important than Mr. Fletcher’s.

Stephen E Arnold, September 4, 2013

Sponsored by Xenky

Attivio Teams up with Capax Global

September 4, 2013

Attivio has signed up another partner, this time a leader in search. PR Newswire reveals, “Capax Global and Attivio Announce Strategic Reseller Partnership.” The move will help Capax Global’s customers smoothly shift from conventional enterprise search to the more comprehensive unified information access (UIA) approach. The press release quotes Capax Global CEO and managing director John Baiocco:

“We have seen a natural shift towards UIA as our enterprise search customers contend with massive volumes of information, coming from multiple sources, in different formats. Traditional approaches are no longer adequate in dealing with the scale and complexity of enterprise information. Attivio leads the industry in addressing the demands of big data volume, variety, and velocity that our customers face.”

David Schubmehl, research director at analysis firm IDC, also weighs in on the importance of UIA:

“Unified information access is the next logical progression beyond enterprise search as companies face unprecedented volumes of disparate information, of which 85 percent or more is unstructured. Because UIA platforms can integrate large volumes of information across disconnected silos, technologies like AIE have become a key enabler for big data analytics and decision support.”

Founded in 2007 and headquartered in Massachusetts, Attivio also has offices in other U.S. states, the U.K., Germany, and Israel. The company’s award-winning Active Intelligence Engine integrates structured and unstructured data, making it easier to translate information assets into useful business insights.

Capax Global celebrates its 20th birthday this year, making it a veteran in the search field. The privately-held company, based in New York, offers consulting services, custom implementations, and cloud-hosting services. An emphasis on its clients’ unique business objectives is no doubt part of its appeal for its many customers, which include Fortune 500 companies and major organizations around the world.

Cynthia Murrell, September 04, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Cape Town WordPress Event Promises More Developments in Autocomplete and Search

September 4, 2013

The article titled ElasticSearch Soon Available for Websites on HumanIPO discusses LightSpeed’s developments in WordPress. The program will soon be emulating Google’s site search, and utilizes autocomplete based on Google entries. This development is scheduled for August. The article states,

“Also keen to improve the WordPress search functionality, the coding expert explained the key role of data in promoting the tool to scalability.

“Hopefully one day the datascale in WordPress is going to scale better because they are looking at reworking the post-type database infrastructure,” Shaw said, relating his knowledge of future plans for the popular open source platform.

The launch of a hostage server environment is also anticipated in August with a new SSL offloading system similar to Google’s recently released SPDY. “

Ashley Shaw, the founder of Lightspeed, spoke at the WordPress event in Cape Town. It is the hope of Shaw and others at WordPress that such events will lead to an African community akin to the WordCamp Europe. Kenya is particularly attractive, having taken mobile technology further. In the meantime, WordPress hopes to lower the cost of its new add-on service, which is currently set at US$2,500 for setup and an additional US$1,119 monthly subscription charge.

Chelsea Kerwin, September 04, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Commerce Search Gets A Tad Bit Better With Enhancements

August 31, 2013

What is the difference between an enhancement and an upgrade? An upgrade indicates a whole new system and solutions for bugs. An enhancement implies that the current piece of software works well, but it is only being made better. Exorbyte, a commerce search application, announced its “Enhancements and Optimizations In July 2013.” The enhancements and optimizations for Exorbyte come in the form of two new features. The first feature is face normalizations that allows users to map different spellings and variations under a single facet value. Another cool thing about this feature is that users can specify the number of times an individual facet value appears in a search. This can push rarer data into search results and limit alternate forms of a query.

Here is the neatest new feature for query-based ranking:

“It is now possible to influence the ranking based on the query itself, allowing for even higher result relevancy and hence conversions. In a global ranking rule search terms can be defined that trigger this ranking rule to come into effect. For example, you can specify that the ranking rule “boost the category toy” is only activated when the query contains the word “ball”. If the query term was not used as a restriction, the category “toys” would always be boosted. For example if the query term was “golf” toy golf products would be placed on top, although users might expect professional equipment, so that this rule should not apply. So the query-based ranking allows you to tune the relevancy in cases.”

Putting the intelligence in intelligent search. It also reminds me of using an auto-tuner to pick up the proper frequency. Features like these help normalize search and make the results useful to the user. Exorbyte asserts it can eliminate the need for facet normalization with its software.

Whitney Grace, August 31, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

LinkedIn: You Search or It Finds?

August 30, 2013

One of the ArnoldIT goslings manages my social media presence. We try to provide information via automation or by asking questions. The “Stephen E Arnold” profile provides some information, but the detail located at www.arnoldit.com/sitemap.html is not included.

I am not sure what my area of expertise is. As I approach 70 years in age, I have worked in fields as diverse as machine shop janitor to advisor to the world’s largest search and software company. Along the way, I have labored inside nuclear facilities, sat in meetings which considered the fate of Fortune 500 companies, figured out how to make an online database produce a profit, and running laps for a person with $9 billion in personal assets.

I am surprised when my social media gosling reports that people are endorsing me for a wide range of capabilities. The most popular is analytics, which is okay. But my focus in analytics is how to make money. My relative, Vladimir Ivanovich Arnold, was into fancy math, which is supposed to “run in our family.” Whatever. The people recommending me are those who are “linked” to me. My view is that when someone wants to be my LinkedIn pal, the person should be involved in some way with content processing. I don’t recall most of the people, but some of the names are familiar. I stick close to Harrod’s Creek, Kentucky, and avoid the bright lights and big city.

Am I a monkey in a cage for those who pay LinkedIn for access to my “content”? Image from Alamogordo.

I was not surprised to read “Why Am I Being Endorsed for Skills and Expertise I Do Not Claim on my Profile?” (Note: I have no idea if you will be able to view this community post on LinkedIn. Your problem to solve, not mine.)

The main point of the post is:

I am receiving notices that I have been endorsed for skills that I have not listed on my profile. I have over 20 years of experience and may done these tasks at some point, but these are not necessarily the same skills I want to highlight currently on my LinkedIn profile and I have not claimed expertise in these areas. Why are any of my contacts being asked to endorse me for skills I don’t want highlighted?

My answer to this question is, “Generate revenue.” But the most interesting item in this community thread comes from someone whom I assume is a LinkedIn employee, cheerleader, or amanuensis. Use the search function in your browser to jump to this snippet once you are in the community post I have cited, please:

Thank you all for the valuable feedback. Our team really appreciates it and we definitely take it into account as we continue to improve the user experience across all of our products and features. With that said, I wanted to clarify a few things regarding endorsements:

1. You can only be endorsed by a 1st degree connection (a LinkedIn member you already know are directly connected with), and you can always manage which endorsements to show.

Read more

Blippex Takes a Fresh Approach to Web Search

August 29, 2013

This is an interesting angle: search that considers time users spend on each site, rather than the usual indicators, like link quantity and quality. GigaOM informs us about this unique approach in, “How Blippex Handles the Data Behind its Time-Driven Search Engine.” The premise is that users spend more time on sites that offer more value.

The budding Blippex is still working with a fairly small index, which is understandable considering it launched this year and has just started to get some traction. Though the Berlin-based company could have chosen to use one of the clouds floating over Europe (not literally), they are going with the web-startup flow and choosing Amazon Web Services. They are also relying on several open-source components, like MongoDB, Elasticsearch, and Redis. See the article for more details on Blippex’s use of those resources.

Writer Jordan Novet explains the unique approach, and points out one possible hitch to the time-spent model:

“The database Blippex uses keeps track of how much time users spend on a given website. The system has a way of making sure pages that sit idle — think of the tab that’s been open on your browser for three days — don’t get incorrectly interpreted as being the most valuable. . . .

“The thing is, web surfers might spend much more time poring over dense content, such as a paper in an academic journal, than on, say, a succinct news article about the same subject, even if the article is more successful at giving people just the information they’re looking for. In that case, time spent is not the best indicator of value.”

Novet makes a good point; we happy geese understand the value of a short, informative article. I checked out the site, and there is a slider aptly named “dwell factor,” with which the user can adjust how much influence time spent has on the results. If I don’t want to rely on dwell-time, though, why shouldn’t I just use Google? Well, privacy is one reason. Like DuckDuckGo, Blippex refuses to collect and share users’ information. In fact, they say, the Duck inspired their privacy policy.

Curiously, Blippex has yet to reveal how they plan to make money on the service. In fact, according to Bloomberg Businessweek, the company claims to have no business model in mind at all. Now that is a daring approach!

Cynthia Murrell, August 29, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Source Code Search Engine Meanpath

August 25, 2013

I am surprised it took this long; we have learned about a relatively new search engine dedicated to ferreting out source code: meanpath. Each day, the service crawls some 220 million sites to capture text, page sources, and server headers. Their website explains:

“Meanpath is a new search engine that allows software developers to access detailed snapshots of millions of websites without having to run their own crawlers. Our clients use the information we gather from your site to help solve problems in these areas: Semantic analysis; Linguistics; Identity theft protection; Malware and virus analysis; We also request your favicon and apple-touch-icon if available for our favicond.com service.”

Well, okay. Meanpath goes on to explain how it accesses sites, and diligently supplies instructions for those who would keep its bot at bay. Also included is a disclaimer, noting that spammers and other unethical operatives might refuse to heed the directives specified in a robots.text file, but advises that a reverse DNS lookup will suss out any bad actors.

Founded last year, meanpath serves a variety of internet-savvy businesses, from SEO consultants to hosting companies. The company makes its home in California.

Cynthia Murrell, August 25, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Surveillance Organization Unable to Search Own Employees Email

August 24, 2013

An article titled NSA Says It Can’t Search Its Own Emails on ProPublica brings up an interesting glitch in the NSA’s surveillance technology. In spite of having the capability to sort through big data with a supercomputer, when it comes to doing a search of NSA’s over 30,000 employees they are at a loss. The article explains,

“There’s no central method to search an email at this time with the way our records are set up, unfortunately,” NSA Freedom of Information Act officer Cindy Blacker told me last week. The system is “a little antiquated and archaic,” she added… It’s actually common for large corporations to do bulk searches of their employees email as part of internal investigations or legal discovery.”

The article also brings up the point that federal agencies often don’t have the funding they need for public records. However, if any agency should have the capability to keep tabs on its employees, it is the agency charged with surveillance of the nation. Lacking that ability limits NSA operatives to searching emails by individuals one at a time instead of searching for keywords or in bulk. This is very interesting in light of recent events, no further comment.

Chelsea Kerwin, August 24, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta