Volunteers can download the Grub web crawler, which runs in the background on their PC, indexing web pages according to their content. The crawler will be used as the basis for Wikia’s forthcoming search service. By contrast, search engines like Google run their own web crawlers and keep details of the way they work secret.
Category Archives: Search
Data is not information unless you can find it. But information can’t be applied to knowledge in the absence of the means by which to use it. Access must yield meaningful information in order to turn bits of code into valuable and actionable business information. I like to think about it this way: libraries house rich caches of data in the form of books, but finding the exact piece of information you seek requires some level of research and library science expertise. Wouldn’t it be ideal if the information you seek could somehow be collected and organized for you, in the format and context in which you want it? And wouldn’t it be even more helpful if it was assembled for you, not just from the local collection, but from libraries in Singapore, Milan, Minneapolis and Copenhagen? Some early examples of this new approach to information management illuminate its enormous possibilities
Being buried in data isn’t a new problem, but the issue has grown exponentially in recent years, as more and more data pours through corporate networks and the Internet. IDG recently dubbed this phenomenon the “digital big bang,” and a quick look at data growth shows why. In digital terms, 161 exabytes (or 161 billion gigabytes) of information was created, captured and replicated in 2006. But by 2010, this number will explode to an estimated 988 exabytes. Much of this data will be created by you and me, individuals. IDC found that 70 percent of the data is created by end users and over the Web. In one day, YouTube streams more than 100 million videos, while 1 billion MP3 files are shared over the Internet daily. The increasing convenience and ubiquity of digital devices also add to the explosion.
Web 2.0 flips the information delivery model upside down–it’s now about global access, and information at your fingertips, aggregated from sources that you don’t even necessarily know about, or care where they exist. Based on a set of search criteria, information in all its rich forms–media, video, audio, images, documents, text–all will be assembled together in context and delivered to users and applications for real-time experience….As information is effectively harnessed, hidden insights will appear that were previously buried in mountains of unorganized data, and more and smarter discoveries will result.
Read this perspective from EMC’s Chief Development Officer Mark Lewis (via C|Net)
So it might not make for interesting content (which is key to a good blog in the long run), but repeat posting on an organization or individual whose actions you oppose is one way to gain that person’s attention and potentially (if you start to get noticed by Google) owning them on Google and other popular search engines…New content, frequently posted on the same person, increasing traffic (which I’ll write about shortly), and boom, before you (and your targeted evil-doer) know it, you own them on Google.
We spend most of our time online searching for information. This is not surprising, since the Web is a vast sea of information, where finding exactly what you are looking for is not easy. But why is it that when we find something on one site it is still not easy to find it on another? Say you found a Harry Potter book on Barnes and Noble, why is it still hard to find the same item on other sites like Amazon and Powells? Why is search a one time deal?
We are used to a Web where each site has its own copy of the information. Each web site is a silo. But that does not need to be the case. If web sites agree on how to represent things like books, music, movies, travel destinations and gadgets, then we would spend a lot less time searching. Imagine that the URL for the Harry Potter Goblet of Fire book is this:
In other words, if there was a standard way to turn things into URLs, then finding information would be a lot easier.
From Standard URLs – Proposal for a Web with Less Search at the Read/Write Web
What if an employer wants to terminate an employee simply based on information it learns as a result of a Google or equivalent search? The answer here may be a classic lawyer’s answer: it depends….The employee argued that “his guaranteed right to fundamental fairness was seriously violated” when his supervisor used Google to search his name and learned that he previously had been removed from a position by the U.S. Air Force. He was concerned that she improperly considered this information.
Read this article from C|Net. Previously from WNM: The resume you’re unaware of: MySpace, Blogs, Onlie videos and For Some, Online Persona Undermines a Résumé.
At this week’s Searchology conference at its headquarters in Mountain View, California, Google announced its move to what the company calls “Universal Search” — the integration of images, video, and other data types (who would ever have thought that “news” was a data type?) into what was formerly just the results page for Google text searches. With Universal Search, if you want to know about a particular movie there is a fair chance the top results may actually INCLUDE the movie, in its entirety.
Universal Search is Google’s attempt to destroy its major competitors who, like Gorbachev in the waning years of the USSR, have to follow suit and start spending money they don’t have if they want to even appear to still be in competition with Google. This means for these companies more software development, more sweeps of the web, as well as the greater likelihood that among their top results will be pages located at Google properties like YouTube.
Read Robert Cringely’s thoughts on Universal Search as well as coverage from C|Net and relevant news affirming that it’s OK to use thumbnail images in search results.
Contribute your knowledge of “Universal Search” to the Whats New Media Wiki.
Ragahavan started with the premise that people don’t want to search, but rather to get tasks done. Search engines spend very little time servicing you compared to the time you spend doing queries, evaluating results, and so on. This is backwards–why aren’t machines should be working harder than we are. He proposes that the grand challenge is to devise general platforms for semantic searches–that is searches that are able to derive meaning from the search terms presented to them.
Read this item from ZDNet’s Between the Lines