Thursday, October 16, 2008

Opera's MAMA

Opera is planning to launch a new search engine MAMA of its own kind at the end of the year.
MAMA (Metadata Analysis and Mining Application search engine) is a search engine for developers. It will help developers to identify the technology used behind website. It could help web developer to understand the current market trends.
MAMA crawls the Web, but instead of indexing the content of Web sites, as most search engines do, it discards the content and indexes the types of technologies being used on sites, such as Cascading Style Sheets (CSS), Hypertext Markup Language (HTML), XHTML etc.
MAMA could be the effiective tool for developers and others to identify the market trends for using technologies.

Wednesday, October 8, 2008

First ever Made in India Microprocessor !!!


India launched its first microprocessor for the servers, through world’s largest chipmaker Intel. This was the first x86 microprocessor with 45nm technology from Intel which has been completely made by the team of 300 Indian's based in Intel's Bangalore research and development centre.

This new Intel Xeon 7400 series processor has up to six processing cores per chip, 16MB of shared cache memory and 1.9 billion transistors. Platforms based on these processors can scale up to 16 processor 'sockets' to deliver servers with up to 96 processing cores inside, offering scalability, ample computing threads and extensive memory resources, Intel said. It delivers almost 50 percent better performance in some cases, and up to 10 percent reduction in platform power, the company said.

Praveen Vishakantaiah, President, Intel India, said, “The quality of available talent, technology ecosystem and business potential are factors which make India a strategic business site for Intel. We have achieved a considerable degree of expertise in product design.”

Monday, September 22, 2008

Yahoo Launches new Social Networking site... SpotM !

In Social networking space mow yahoo is spreading its wings with SpotM - Social networking site only in India...

How it is different from other website...
Allows users to define secret friends that their other friends won’t be able to see SMS integration/Anonymous chat
Private conversations with others without revealing their mobile number.

How GOOGLE works???

Google the most efficient and popular search Engine.... How it Works... lets see....



Google runs on a distributed network of thousands of low-cost computers and can therefore carry out fast parallel processing. Parallel processing is a method of computation in which many calculations can be performed simultaneously, significantly speeding up data processing. Google has three distinct parts:

  • Googlebot, a web crawler that finds and fetches web pages.
  • The indexer that sorts every word on every page and stores the resulting index of words in a huge database.
  • The query processor, which compares your search query to the index and recommends the documents that it considers most relevant.

Let’s take a closer look at each part.

1. Googlebot,:

Googlebot is Google’s web crawling robot, which finds and retrieves pages on the web and hands them off to the Google indexer. It’s easy to imagine Googlebot as a little spider scurrying across the strands of cyberspace, but in reality Googlebot doesn’t traverse the web at all. It functions much like your web browser, by sending a request to a web server for a web page, downloading the entire page, then handing it off to Google’s indexer.

Googlebot consists of many computers requesting and fetching pages much more quickly than you can with your web browser. In fact, Googlebot can request thousands of different pages simultaneously. To avoid overwhelming web servers, or crowding out requests from human users, Googlebot deliberately makes requests of each individual web server more slowly than it’s capable of doing.

Googlebot finds pages in two ways: through an add URL form, www.google.com/addurl.html, and through finding links by crawling the web.

When Googlebot fetches a page, it culls all the links appearing on the page and adds them to a queue for subsequent crawling. Googlebot tends to encounter little spam because most web authors link only to what they believe are high-quality pages. By harvesting links from every page it encounters, Googlebot can quickly build a list of links that can cover broad reaches of the web. This technique, known as deep crawling, also allows Googlebot to probe deep within individual sites. Because of their massive scale, deep crawls can reach almost every page in the web. Because the web is vast, this can take some time, so some pages may be crawled only once a month.

Although its function is simple, Googlebot must be programmed to handle several challenges. First, since Googlebot sends out simultaneous requests for thousands of pages, the queue of “visit soon” URLs must be constantly examined and compared with URLs already in Google’s index. Duplicates in the queue must be eliminated to prevent Googlebot from fetching the same page again. Googlebot must determine how often to revisit a page. On the one hand, it’s a waste of resources to re-index an unchanged page. On the other hand, Google wants to re-index changed pages to deliver up-to-date results.

To keep the index current, Google continuously recrawls popular frequently changing web pages at a rate roughly proportional to how often the pages change. Such crawls keep an index current and are known as fresh crawls.

2. Google’s Indexer

Googlebot gives the indexer the full text of the pages it finds. These pages are stored in Google’s index database. This index is sorted alphabetically by search term, with each index entry storing a list of documents in which the term appears and the location within the text where it occurs. This data structure allows rapid access to documents that contain user query terms.

To improve search performance, Google ignores (doesn’t index) common words called stop words (such as the, is, on, or, of, how, why, as well as certain single digits and single letters), some punctuation and multiple spaces, as well as converting all letters to lowercase, to improve Google’s performance.

3. Google’s Query Processor

The query processor has several parts, including the user interface (search box), the “engine” that evaluates queries and matches them to relevant documents, and the results formatter.

PageRank is Google’s system for ranking web pages. A page with a higher PageRank is deemed more important and is more likely to be listed above a page with a lower PageRank.

Google also applies machine-learning techniques to improve its performance automatically by learning relationships and associations within the stored data. For example, the spelling correcting system uses such techniques to figure out likely alternative spellings.

Indexing the full text of the web allows Google to go beyond simply matching single search terms. Google gives more priority to pages that have search terms near each other and in the same order as the query. Google can also match multi-word phrases and sentences.