Indexing Teen Messages?

September 7, 2015

If you are reading teens’ SMS messages, you may need a lexicon of young speak. The UK Department of Education has applied tax dollars to help you decode PAW and GNOC. The problem is that the http://parentinfo.org/ does not provide a link to the word list. What is available is a link to Netlingo’s $20 list of Internet terms.

image

Maybe I am missing something in “P999: What Teenage Messages Really Mean?”

For a list of terms teens and the eternally young use, check out these free links:

I love it when “real journalists” do not follow the links about which they write. Some of these folks probably find turning on their turn signal too much work as well.

Stephen E Arnold, September 7, 2015

Bing Does App Indexing

May 22, 2015

I am one of the few people who use my smartphone to make calls and respond to the text instructions from my wife. I am not into apps. I have a nice, multi screened desktop computer which allows me to do what I need and want to do. I am in the minority, and I quite like it that way.

I read “Make Apps Stand Out in Search with App Linking.” I suppose if I needed an app, I would want to be able to locate the candidate software for my consideration. Once I locate a suitable app, I want to read reviews and maybe—not very often—but maybe load a trial version to see if the app actually “apps.” I just submitted one of my for fee columns and titled it “In App or Inept.” The reason? Apps are not exactly the type of software I want to use.

Remember. I work at a desk, three monitors, 13 computers/servers, two high speed data connections, VPNs, and software my team and I built. Apps are not what meet my needs. But there are many attention challenged, entitlement fueled younger folks who are into the “app” thing. I think that most apps are inappropriate for the type of work I do and perhaps other folks should actually do.

I don’t telework or telecommute. I actually work, answer the phone, and produce outputs. Some of the outputs are software like Overflight and Augmentext. Others are outputs like this article pointing out that apps are programs which perform a limited set of functions. For the mobile, telecommuter, concentration deprived, and ever to busy knowledge worker, apps are the cat’s pajamas.

Bing is not going to permit app discovery. I would be happier if Bing did these things:

  1. Indexed more substantive content
  2. Eliminated the need for me to search Microsoft research and Bing for information
  3. Provided an interface which allowed me to concentrate on relevant results
  4. Improved relevance
  5. Provided meaningful ways to present data; for example, time sort, date content added to the index, and other pre-pre diluvium operations.

I chuckled at this diagram:

image

I have zero idea what the diagram is supposed to mean. I know that when I tested a Lumia Windows phone, I could not locate apps. The sparseness of information was a turn off. Hey, how tough is it to provide a link to the developer’s Web site? Obviously pretty tough.

The Bing enhancements are part of the “deep linking” craze. The idea is that an app does something and data are usually needed for that something. To allow the app to spit out a result, which may or may not be what the user wants, the app “goes to another Web site” or “to a database”. What’s going on is a dumbing down and conveniencing up of information access. Perfect for a user with an attention span less than a goldfish’s and the reading skill of a bright sixth grader.

How does this work? Well, you use code like this:

image

Don’t worry. Your eyes are not failing. The code snippet was illegible on the Bing blog Web page. New president, same old Microsoft. Enchanting.

Here’s the passage I highlighted in Microsoft blue:

We’re also already in the process of bringing this apps and actions intelligence to Bing and Bing-powered search results including Cortana and Windows 10 and we will have more to share later. In fact, look for an upcoming post on how we will start applying this to our results soon.

Okay, can’t wait. Watch for my in app or in ept article in Information Today. Nah, never mind. You already know that I prefer substantive information access. App finding is a tiny part of the content universe. I want more progress on the more substantive information which is increasingly difficult to find. Use Bing to locate Babak Parviz’s work at Microsoft on the bionic contact lens. Now use Bing to track Dr. Parviz from Google to Amazon. Let me know how that works out for you. Is there an app for that with deep linking no less?

Stephen E Arnold, May 22, 2015

Indexing Rah Rah Rah!

May 4, 2015

Enterprise search is one of the most important features for enterprise content management systems and there is huge industry for designing and selling taxonomies.  The key selling features for taxonomies are their diversity, accuracy, and quality.  The categories within taxonomies make it easier for people to find their content, but Tech Target’s Search Content Management blog says there is room improvement in the post: “Search-Based Applications Need The Engine Of Taxonomy.”

Taxonomies are used for faceted search, allowing users to expand and limit their search results.  Faceted search gives users a selection to change their results, including file type, key words, and more of the ever popular content categories. Users usually don’t access the categories, primarily they are used behind the scenes and aggregated the results appear on the dashboard.

Taxonomies, however, take their information from more than what the user provides:

“We are now able to assemble a holistic view of the customer based on information stored across a number of disparate solutions. Search-based applications can also include information about the customer that was inferred from public content sources that the enterprise does not own, such as news feeds, social media and stock prices.”

Whether you know it or not, taxonomies are vital to enterprise search.  Companies that have difficulty finding their content need to consider creating a taxonomy plan or invest in purchasing category lists from a proven company.

Whitney Grace, May 4, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Google Search Appliance VS SearchBlox Price and Indexing Limit Comparison

January 16, 2014

The article on SearchBlox titled Google Search Appliance Price Comparison with SearchBlox explains the slippery pricing data available to consumers on Google Search appliances. The article states that for a more limited document storage space Google charges $30,000 while SearchBlox, for unlimited storage, charges only $5,000 (but these numbers are only approximations). SearchBlox also offers more constant support and maintenance than Google, making it a very appealing option in the world of intranet or Web site search.

The article explains:

“SearchBlox provides the option of seamlessly moving away from the Google Search Appliance without skipping a beat. In addition to the cost-savings and feature comparison, scalability of the solution is something to consider given the explosion of content. SearchBlox scales both vertically (by adding more CPU/RAM to the existing setup) and horizontally (by adding more search servers that can be run in a cluster) without disrupting your architecture.”

SearchBlox even allows for Google administrators using XLS with a “faceted search plugin” that promises not to disturb the infrastructure. Allowing users to index unlimited documents certainly beats Google’s 500K indexing limit. A quick check of the GSA Advantage site shows that the Google Search Appliance is a significantly more expensive alternative to the open source based SearchBlox solution.

Chelsea Kerwin, January 16, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

History of Web Indexing: BBC Style

September 4, 2013

I read “Jonathon Fletcher: Forgotten Father of the Search Engine.” I have no quibble with the claims that the first Web crawler was an invention spawned in the United Kingdom.

I did find several interesting factoids in the write up; for example:

  1. Google has indexed more than one trillion pages. On the surface, this sounds just super. However, what is the cost of maintaining the index of the alleged one trillion pages? Is Google cutting corners in its indexing to reduce costs? Perhaps the BBC will expand on this statement. A trillion is a big number and I wonder what percentage of those “pages” are indexed on a daily basis to keep the index fresh.
  2. “Because websites were added to the list manually, there was nothing to track changes to their content. Consequently, many of the links were quickly out-of-date or wrongly labeled.” Is this true today?
  3. “By June of 1994, JumpStation had indexed 275,000 pages. Space constraints forced Mr Fletcher to only index titles and headers of web pages, and not the entire content of the page, but even with this compromise, JumpStation started to struggle under the load.” Decades ago the black hole of Web indexing was visible. Now that Big Data have arrived, won’t indexing costs rise in lock step? What cost savings are available? Perhaps indexing less content and changing the index refresh cycles are expedient actions? Have Bing, Google, and Yandex gone down this path? Perhaps the BBC will follow up on this issue?
  4. “But in my [Fletcher’s] opinion, the Web isn’t going to last forever. But the problem of finding information is.” Has progress been made in Web search?

One interesting aspect of the write up is the conflation of Web search with other types of search. The confusion persists I believe.

Perhaps the BBC will look into the contributions to search of Dr. Martin Porter, the inventor of the Porter Stemmer. Dr. Porter’s Muscat search technology was important, arguably more important than Mr. Fletcher’s.

Stephen E Arnold, September 4, 2013

Sponsored by Xenky

Database Indexing Explained

July 29, 2013

Finally, everything you need to explain database indexing to your mom over breakfast. Stack Overflow hosts the discussion, “How Does Database Indexing Work?” The original question, posed by a user going by Zenph Yan, asks for an answer at a “database agnostic level.” The lead answer, also submitted by Zenph Yan, makes for a respectable article all by itself. (Asked and answered by the same user? Odd, perhaps, but that is actively encouraged at Stack Overflow.)

Yan clearly defines the subject at hand:

“Indexing is a way of sorting a number of records on multiple fields. Creating an index on a field in a table creates another data structure which holds the field value, and pointer to the record it relates to. This index structure is then sorted, allowing Binary Searches to be performed on it.

“The downside to indexing is that these indexes require additional space on the disk, since the indexes are stored together in a table using the MyISAM engine, this file can quickly reach the size limits of the underlying file system if many fields within the same table are indexed.”

Yan’s explanation also describes why indexing is needed, how it works (with examples), and when it is called for. It is worth checking out for those pondering his question. A couple other users contributed links to helpful resources. Der U suggests another Stack Overflow discussion, “What do Clustered and Non Clustered Index Actually Mean?“, while one, dohaivu, recommends the site, Use the Index, Luke.

Cynthia Murrell, July 29, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Tag Management Systems Use Governance to Improve Indexing

March 18, 2013

An SEO expert advocates better indexing in the recent article “Top 5 Arguments For Implementing a Tag Management Solution” on Search Engine Watch. The article shares that because of increased functionality and matured capabilities of such systems, tag management is set for a “blowout year” in 2013.

Citing such reasons as ease of modifying tags and cost reduction, it is easy to see how businesses will begin to adopt these systems if they haven’t already. I found the point on code portability and becoming vendor agnostic most appealing:

“As the analytics industry matures, many of us are faced with sharing information between different systems, which can be a huge challenge with respect to back-end integrations. Tag management effectively bridges the gap between several front-end tagging methodologies that can be used to leverage existing development work and easily port information from one script or beacon to another.”

I think this is a very interesting concept and I love the notion of governance as a way to improve indexing. I am reminded of the original method from the days of the library at Ephesus. Next month, the same author will tackle the most common arguments against implementing a tag management system. We will keep an eye out.

Andrea Hayden, March 18, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Google: Objective Indexing and a Possible Weak Spot

February 6, 2013

A reader sent me a link to “Manipulating Google Scholar Citations and Google Scholar Metrics: Simple, Easy, and Tempting.” I am not sure how easy and tempting the process of getting a fake scholarly paper into the Google index is, but the information provided is food for thought. Worth a look, particularly if you are a fan of traditional methods for building a corpus and delivering on point results which the researcher can trust. The notion of “ethics” is an interesting additional to a paper which focuses on fake or misleading research.

Stephen E Arnold, February 7, 2013

Mobile Search Improves as Cloudant Integrates Full Text Indexing

October 16, 2012

The mobile app field is on fire right now as more businesses add Web and mobile applications, and one company is making great strides in mobile search capabilities. Cloudant has announced its cloud database service is adding full-text indexing and search powered by Apache Lucene. We learn in “Cloudant Upgrades Cloud Database Server With Integrated Text Indexing and Search Basedon Apache Lucene” on PRNewswire that Cloundant “Search 2.0” allows developers to enhance their Web and mobile apps with full-text search and analysis of documents.

The article continues:

“‘Search 2.0 enables the types of text analytics that just aren’t possible with the limited in-database search capabilities of SQL or other search systems,’ said Mike Miller, co-founder and chief scientist at Cloudant. ‘I can’t think of any application out there that wouldn’t benefit from better search. By drawing on the speed and simplicity of Lucene, we are able to provide developers with an easy, familiar way to do that for large amounts of application data that will perform at-scale for massive amounts of users.’”

While Cloudant’s moves in the mobile search field are impressive, our research indicates that accurate enterprise search is still needed in the industry. Intrafind’s enterprise search applications can answer the need to “find information securely.” The company’s iFinder is a basic solution for structured and unstructured enterprise data, allowing users to gain access the information needed in an enterprise quickly and efficiently.

Andrea Hayden, October 16, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Google and Latent Semantic Indexing: The KnowledgeGraph Play

June 26, 2012

One thing that is always constant is Google changing itself.  Not too long ago Google introduced yet another new tool: Knowledge Graph.  Business2Community spoke highly about how this new application proves the concept of latent semantic indexing in “Keyword Density is Dead…Enter “Thing Density.”  Google’s claim to fame is providing the most relevant search results based on a user’s keywords.  Every time they update their algorithm it is to keep relevancy up.  The new Knowledge Graph allows users to break down their search by clustering related Web sites and finding what LSI exists between the results.  From there the search conducts a secondary search and so on.  Google does this to reflect the natural use of human language, i.e. making their products user friendly.

But this change begs an important question:

“What does it mean for me!? Well first and foremost keyword density is dead, I like to consider the new term to be “Concept Density” or to coin Google’s title to this new development “Thing Density.” Which thankfully my High School English teachers would be happy about. They always told us to not use the same term over and over again but to switch it up throughout our papers. Which is a natural and proper style of writing, and we now know this is how Google is approaching it as well.”

The change will means good content and SEO will be rewarded.  This does not change the fact, of course, that Google will probably change their algorithm again in a couple months but now they are recognizing that LSI has value.  Most IVPs that provide latent semantic indexing, content and text analytics, such as Content Analyst,have gone way beyond what Google’s offering with the latest LSI trends to make data more findable and discover new correlations.

Whitney Grace, June 26, 2012

Sponsored by Content Analyst

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta