I have several large sites, each with 30,000 to 70,000 articles.
For some unknown reason, during the last two months,
the Google index was steadily declining until it felt from more
than 20,000 articles/site indexed to around 500.

Sites have been up for more than a year and all sitemaps were submitted
and accepted by Google without problems several months ago.

At high point, sites were indexed to about 20,000 - 50,000 pages per site.
Considering there are older versions of the pages on the sites, the correct
index count should be over 50,000 pages until Google removes old versions.

Recently, it has been discovered that Google search does not produce the
correct number of indexed pages on SERPs.

For example, for site we see following results:

Search on

1,250 from

But, once in a while you do get the correct number of pages indexed:

61,900 from

So, if you are doing a search, the chances of you getting almost 40 times
lower count of indexed pages than Google actually indexed, is over 98%.

For example:


When you do a search on here is what you get:

1,530 from

Here is the correct result is:

37,900 from

Here is a screenshot of the correct index:

When you do a search on domain,
without specifying site:... this is what you get:

Search on

This is what search shows most of the times:

3,790 for

while the correct number is:

1,030,000 for

which is whooping 250 times higher in reality!

Screenshot for the correct index:

Does anybody know what this is?

Has anybody else seen something of this sort?

And here is the stats for all sites in question:
Here are the stats for all the sites:

Search on

61,800 from - correct result
1,160 from - incorrect result
(off by whooping 40 times)

Search on domain, without site:...

Results 21 - 30 of about 1,030,000 for  - correct result
Results 1 - 10 of about 3,360 for - incorrect result
(off by whooping 250 times)


Search on

37,900 from - correct result
1,490 from - incorrect result
(off by 20 times)

Search on domain, without site:...

Results 1 - 10 of about 826,000 for - correct result
Results 1 - 10 of about 4,460 for - incorrect result
(off by 200 times)


Search on

32,800 from - almost correct result, but lower than it should be
2,880 from - incorrect result
(off by 10 times)

Search on domain, without specifying site:...

Results 1 - 10 of about 543,000 for - correct result
Results 1 - 10 of about 4,320 for - incorrect result
(off by > 100 times)


Search on

25,400 from - almost correct result, but lower than it should be
315 from - incorrect result
(off by almost 100 times)

Search on domain, without specifying site:...

Results 1 - 10 of about 513,000 for - correct result
3,060 for - incorrect result
(off by 150 times)


Search on

10,400 from - incorrect, but closer to reality (which should be > 40,000)
3,190 from - incorrect result

Search on domain, without specifying site:...

Results 1 - 10 of about 459,000 for - correct result
Results 1 - 10 of about 6,200 for - incorrect result
(off by > 70 times)


Search one

71,200 from - correct result
15,400 from - incorrect result, but better than other incorrect results

Search on domain, without specifying site:...

Results 1 - 10 of about 471,000 for - correct result
Results 1 - 10 of about 13,600 for - incorrect result

I think it would be best If you consulted a professional to help you resolve issues you are having with indexing this site.
google does not index all pages of a site and it can drop pages over time. there can be many reasons for pages being dropped and without looking at your site and going through te articles.
Are your articles unique content. Most article sites if they take article submissions from the public will have major problems with duplicate issues. this is because people submit their articles to many different sites. You need to go through your sites using something like copyscape to find out if people are submitting original content.
1) Google is not obliged/required to crawl/index anyting

2) Google will typically only crawl/index a set % of a site anyway - the % may vary based on things like Trust, Authority, Popularity, Internal Link structure, server responses and time length for responses etc.

3) Crawling does Not = Indexing

4) Indexed does Not = shown in SERPs

5) Google /may/can/does Filter results in the SERPs ...
It may decide Not to show some URLs if it sees them as Duplciates (full/partial - Internally/Externally and/or due to Canonical issues). 
    It may decide Not to show some URLs if it perceives them as being "weak" (little/no content, liitle/no original content, no intenral links, poor response times, poor response history etc.).

6) Results given may vary based on DataCenters - G's info is on multipel networks - the DC you speak to may change based on the response speed of the DCs, your ISP, your Browser, the time of day etc.

7) The figures shown by G tend to be "estimates" or "guesses" - as you click through the Pager Links at the bottom, the figures tend to change.
   (It may say "of about 5000" on page one, go to page 25 and it may say "of about 1200" and if you go to page 79 it may say "of about 800")

8) You may find that Google is actually "consolidating" it's figures.   The figures you saw before could have been wild guesses - but now G has had time to properly crawl the site, it has realised that it actually on has X pages, and never had Y pages in the first place!

9) The only way o have a better idea (but still NOT likely to be 100% accurate!) is to click through ALL the pager links!
Due to the site of the site, using a site:operator is likely to be ineffective - isntead you should use the domain plus a Directory, or possibly even a SubDirectory...

You know - I'm getting tired of explaining this one .... I need to write up and Auto-Resposne for it!
ok I've had a quick look at the site. you have problems with the way you are running the site - you have www and non www versions for the page and you also offer a frames version and a non frames version of the site.
You have deifinate duplicate content issues as your site is providing a lot of sample code that is found elsewhere on the  web and in a much more user friendly (and search engine friendly) manner.
I would suggest you actually look at the why you are delivering the site to see how you can make it much smoother and user firendly - and then you need to deliver some unique content.
Well, the issue is not wether things go up and down. I don't even question that at the moment.
The issue is why the numbers jump by tens and even hundreds of times?

Articles are unique with exception of some articles being relevant to more than one
chapter, in which case it is included in more than one chapter. But within each
chapter articles are guaranteed to be unique. It is hard to estimate the exact number.

Sure, there are some things that need to be polished, but it is a matter of priorities.
Right now the main priority is identification of the issues related to these dramatic
jumps. Once we figure that one out, we can go and take care of some fine points.
I am not even concerned with optimising sites to get the absolute best in any kind
of ratings or trying to get into #1 position. There is sufficient traffic and it is steady.
But it did drop by 3 times while the number of articles indexed dropped by 30 times.
"you have www and non www versions for the page"

Sorry, can you gimme a reference to that page?
I just checked for that one the other day. Was ok. Not sure it is the same site
you are looking at.

"and you also offer a frames version and a non frames version of the site."

Well, I thought this is a non issue because they are both accessible from
the 2nd level page and site is crawlable via non frame version of the site.
Actually, the chapter index is exactly the same for both version. The only
thing I can think of to be a problem is that it uses "target", and I was thinking
of creating a separate index page that does not use target, if that will change anything.
Why is the big question over this :)
All we can do is reduce the chances of losing pages in the index and the web is a dynamic place. duplicate content is the biggest reason for pages being dropped but there could also be time facor involved (though whether 1 year is enough for that I'm not sure)
One of my orignal sites has over 3,500 pages but google now only has 290 pages indexed and that's a site that's been around well over 10 years. I'm quite sure that somewhere in google index there's a popularity factor whereby if no one calls your page for x number of years/months etc it will get dropped because even google can not hope to store everything that nobody wants.
Indexing can go up and down and I'm sure that is related someway to what people are searcing for.
Well, I do not even expect indexing to be steady as a rock. Obviosly.
It did fluctuate during these 6 months, and quite a bit, which is understandable.

But what I am after, is seeing these jumps by tens and thousands of times
doing the same thing. And there is something very consistent in those jumps
and it IS reproducible.

What happens is this: when you do a search, most of the times it gives you
that lower number, say 1000 pages. But if you use > link on SERP, at
exactly the 4th page, and CONSISTENTLY on the 4th page, it will, all of a sudden
jump tens, if not hundreds of times, say to 10,000 pages.

AND, on the top of it, even if you continue to navigate the SERP,
it would stay at that higher number. Interestingly enough, after it jumps to higher
number, if you do a domain search (not site:...) that number would also jump
accordingly, say from 5,000 pages to 500,000 pages, and would also stay
there even if you navigate thru SERP or randomly select a different starting page.

I AM aware of the fact that SERP index jumps from page to page, and it can
jump dramatically, even by tens of time. But what I am seeing here is a
totally different behaviour. If it was jumping as it usually does, it would not
stay consistent at the higher number once you pass that magic 4th page
in serps, just like these show:

Results 31 - 40 of about 37,900 from (0.13 seconds)
Results 31 - 40 of about 73,400 from
Results 31 - 40 of about 33,200 from
Results 31 - 40 of about 25,400 from


As you can see, it is EXACTLY the same condition.

Now, interestingly enough, you never jump to that much higher number
if you simply keep redoing search by pushing Search button again, no
matter how many times.

And it is not exactly consistent and always reproducible by doing Next
on a SERP. If you are not lucky, and when you hit that magic 4th page
and index did not jump to that higher number, from then on, no matter
what you do, you won't be able to get it to jump to that number.
Thanx, I'll look at that site.
I was mostly dealing with the domain sites as a reference.
There is a lot of work to maintain all these sites and things may and do get out of sync.
By the way, that high number IS consistent. It does not just jump to a completely
different higher number, like if you navigated the SERP. It is ALWAYS the same
exact number for a given site, no matter if you navigate the SERP with next/prev
buttons or randomly chose some SERP page. This is not the same behaviour as
I usually see while doing next/prev.

Alos, I have verified quite a few pages by randomly jumping to different SEPR
and clicking on some link. They were all valid and existing pages from all sorts
of places in the world. Some Indian, Chinese, Japaneze, Arabic sites, and you
name it. Not sure if I saw a single page that did not exist.

That means that large number is correct. It is not just some kind of bug or
a totally off the wall estimate. Whenever I saw index jump dramatically while
navigating SERP via next/prev link, it ALWAYS jumped lower, never higher,
especially by tens and hundreds of times.

This is a different animal we are dealing with here, and this stuff started happening
relatively recently. I just noticed it within the last week accidentally. I was watching
the index steadily declining during the last couple of months, and pretty radically,
and was trying to figure out why would that be, untill, all of a sudden, I did a site:
search and it jumped hundreds times higher, and stayed there, no matter what I do.

Then, after a couple of hours, if I repeated the seach for the same site, it would
jump back to that lower number (with usual variation of +/- 10% or so, and there
is nothing you could do to make it jump to a higher number, untill I magically
stumbled upon it again and noticed that magic 4th page phenomenon.

And that is exactly what I am talking about here.

There are screenshots above. You can look at them. May be it will give you some idea.
Who knows?
Basically, the bootom line for me is this:
I have no problems with that higher number, because according to my estimates,
it does correspond to real state of affairs, considering G bot may find things I did
not even suspect. That is all fine with me and I am not questioning any of it.

But the isssue is: that higher number is not just an abberation. It does correspond
to what indexing eventually converged to, and the very existence of that much
lower number is an obvious suspect, regardless of all other criteria and factors.
Btw, if you need more sample data of any kind, such as sample history
with exact time stamps and all the exact numbers, just ask. We can get that
in a wink of an eye, including all other screenshots showing the correct
high numbers in SERPs.
Let me try to answer some of these:

1) Google is not obliged/required to crawl/index anyting

Not applicable to this case.

2) Google will typically only crawl/index a set % of a site anyway - the % may vary based on things like Trust, Authority, Popularity, Internal Link structure, server responses and time length for responses etc.

Not applicable to this case.

3) Crawling does Not = Indexing

Not applicable to this case.

4) Indexed does Not = shown in SERPs

Not applicable to this case.

5) Google /may/can/does Filter results in the SERPs ...
    It may decide Not to show some URLs if it sees them as Duplciates (full/partial - Internally/Externally and/or due to Canonical issues).
    It may decide Not to show some URLs if it perceives them as being "weak" (little/no content, liitle/no original content, no intenral links, poor response times, poor response history etc.).

Not applicable to this case.

6) Results given may vary based on DataCenters - G's info is on multipel networks - the DC you speak to may change based on the response speed of the DCs, your ISP, your Browser, the time of day etc.

Not applicable to this case.
The variations between different DCs normally produce a totally different picture
in terms of sample deviation and stability of post settle period. (Post settle means
once you hit that higher number, it does not variate from there, while in the situations
you describe, it will indeed change and relatively wildly, as has been observed
before on numerous occasions.

7) The figures shown by G tend to be "estimates" or "guesses" - as you click through the Pager Links at the bottom, the figures tend to change.

Not applicable to this case.

   (It may say "of about 5000" on page one, go to page 25 and it may say "of about 1200" and if you go to page 79 it may say "of about 800")

Not applicable to this case.

The highest possible number on SERPs, is the number you hit the first time
you push the Search button. From then on, no matter how you havigate the
SERPs, you can NEVER get the number higher than the initial number you
got when you hit the Search button.

8) You may find that Google is actually "consolidating" it's figures.  The figures you saw before could have been wild guesses - but now G has had time to properly crawl the site, it has realised that it actually on has X pages, and never had Y pages in the first place!

Not applicable to this case.

9) The only way o have a better idea (but still NOT likely to be 100% accurate!) is to click through ALL the pager links!
Due to the site of the site, using a site:operator is likely to be ineffective - isntead you should use the domain plus a Directory, or possibly even a SubDirectory...

Not applicable to this case.

But thanx for suggestion. I did these kinds of things also.
Hold on one minute.

PLEASE give us some URLs of these "searches".
A Google SERPs URL showing hte Low figure
A Google SERPs URL showing the High figure
a Google SERPs URL for the "domain search"
Level 1

All that data is above, in the initial post for this thread.

If you need some specific data, just tell me exactly what do you want to see.
I'll see what I can get for you and how fast.

Read the initial logs carefully and try to see what it could possibly mean.
All the data is there.
m not having you sit there and say "that don't count for me" - not without a * good explanation as to WHY those things are Not Applicable to you.

WHY - are your sites/Pages/URLs magic?
Have you got a gold contract with Google to index Every single * page on your site?

Come on - I want to hear (read ;)) WHY none of the above - that can/does affect a large % of sites out there - doesn't apply to YOU!
One more time: I can get you all the screenshots for site:... and domain types of searches
for the higher numbers. For the lower numbers, you can get yourself using those URLs.
Level 1
I have all the screenshot images saved in the files. Just tell me which ones you want to see.
Cause there are quite a few of those.
I'll make you a page where you can verify it 100%.
Screenshots do NOT explain WHY your site is so special that it wouldn't suffer ANY of the typical Variances that occur in C/I/R and/or SERPs composition/display.

I want to KNOW why your site/URL is not applicable to such things.

Nope, domain search was done without using the leading http://

Screenshots actually show the exact type of search and the search string.
Level 1

I am not talking about site being "special" or not.
It is a different issue, the issue of CONSISTENCY of SERPs behavior
under the same exact conditions, regardless of how "good" or "bad" do you think
that site MAY look to G bot.

As far as SERPs variations go, even from navigating by next/prev links,
I AM aware of those and it has been described above.

I am talking about THE SAME EXACT conditions within the context of
the same site. SERPs do not just wildly jump up and down. Again, the
results can only DECREASE from the initial SERP when you hit that
Search button. They can never jump higher from that number, which is
exactly what we are seeing here.
So every time you do a search - you are going to a specifc DataCenter - vcia an IP?
As I'm not seeing that in those images.

What I am seeing is this


That Search criteria means anything that contains that string - whether it is on that domain or Not!
So if ANYONE talks about your site, and mentions your DomainName - then it may be included in those SERPs!

If I went and put "" on one of my sites...
in a few days, that result figure may move up to be
"of about 1,030,001"
instead of hte current
"of about 1,030,000"

See that?
It would possibly go up by 1 bec ause I would have put that bit of text - the one that matches what you are searching for - on a page somewhere.
NAff all to do with your site, naff all to do with what Google has indexed from your site.

Is this making any sense at all????
Level 1
(Because that initial number is the ABSOLUTE BEST number
for that site as seen by particular DC, that happened to serve
that result at that particular instance.

No matter how you navigate the SERPs from then on, you can
not possibly jumpt to a number on the INITIAL SERP that is
tens or even hundreds number of times higher.)

Just read this carefully.
"What I am seeing is this


That Search criteria means anything that contains that string - whether it is on that domain or Not!"

- Not applicable to this case.

Because, whatever that number happens to be, it should always come out more or less
the same, within reasonable variation.

But it should NEVER jump from 4,000 to 3,000,000,
no matter what kind of search you perform.
This just not how things work.

"If I went and put "" on one of my sites...
in a few days, that result figure may move up to be
"of about 1,030,001"
instead of hte current
"of about 1,030,000"

Fine, no argument about that.

That number should NEVER go down to 4,000 from 3,000,000
in a SINGLE session, no matter which DC you happened to have
hit, and no matter what stage of transaction (updating that exact
site on that exact DC at that exact time) happens to be.

It should be consistent within a reasonable range,
which is calculated as follows:

Crawl rate per day (or index update period, irrelevant)
as derived from your rule set divided by total number of pages indexed
on your site, given your particular ranking.

THAT is your expected variation of the outcome.

ALL other conditions are irrelevant and are not applicable.
But you are NOT going to the Same DC every time - are you!
There could be a bottle neck, there could be a recurring issue with your site, there could be vcarious other issues.

But until you are testing it Proeprly, you do not know.
Go and dig out a Google DC IP.
Start doing your tests on that same DC IP.

Infact - go and pick 3 or 4.
Log the resutls fro mthose.
Then you may get more insight.
Level 1
To simplify:

Lets say that your site is crawled at rate of 1000 pages/day,
which is a realistic number and corresponds to actual observations.

Lets say your site has 100,000 pages indexed.

So, within a 24 hr. period, the MAXIMUM variation of the FIRST page on SERP
you saw after hitting that Search button, should be

NO MORE THAN 99,000 to 101,000 pages.

So, your ABSOLUTE maxumum you should see in the best of situations,
(which is estiate value as reported by SERP), should be no higher than

Nor it should EVER go down to 5,000 within the same 24 hr. period.
Further more - your "equation" does not include any value, nor variance, for "pushes" to various DCs, nor the possibilities of "roll-backs".
(The later being more than udnerstandable - as G isn't likely to tell us how often that may happen, the extent it goes to, nor what sites were affected).

BAsically - you are viewing it as "flat" or "linear" ... it is not.
It is matrixed and multi-dimensional.
There are Nuemrus DCs - that get updated at different times, holding data from different times.

Further - if there are any "SERPs compilation changes" due to thigns liek dupes/weak pages etc. - it may not be applied to all DCs at the same time .... it may Roll-out and catchup .... or, if your site is responding poorly on occassiona, it may result in "waves" through the DCs as the responses fail/improve/fail/improve .... or those of other sites etc.

Level 1

"But you are NOT going to the Same DC every time - are you!"

ALL DCs are updated within the same 24 hr. period, more or less,
without knowing the exact scheduling of events.

That means, that ALL of them, at some point during the 24 hr. period
would come to the same number for your particular site, which is
exactly what I have observed upto now. The updates of one of your
domains may create a variation between different DCs DURING THE
TRANSACTION TIME ONLY, which lasts for a given site no longer
than a couple of hours, considering it is done in even smaller chunks.

So, after a couple of hours, ALL DCs stabilize as to your particular site
and all of them should show the same exact number, which is what we
are seeing here all the time.

REGARDLESS of bottlenecks and granularity of transaction.
Meaning, by "grand" transaction as to your particular site, is COMPLETE
update to bring that particular DC to full correspondence with the results
of the latest update.

That "grand" transaction is not necessarily performed all at once,
but is broken into smaller transactions.

After "grand" transaction is finished, ALL DCs contain the same data.

So, whenever you see the variations in your FIRST SERPs page,
that means a "grand" transaction is in progress for that particular domain.
During that time, you WILL experience MAXIMAL variations, that, at the
same time, can not possibly exceed the max. crawling rate given a
particular set of parameters for your ranking, etc.
Of course - we are also discouting the possiblility of REsource Preservation - maybe G introduces cut-offs on the allocation of resources for htestimates etc.
maybe it's only done on certain DCs.
maybe it's only done at certain times of day, or on X number of requests.


The short of it is -

IF you want to know what is indexed - to any approximation of accuracy - they you need to do as advised above;
and/or use

AND click through to the last pager link - whilst making the requests to a Specific DCIP !
Level 1

Understood. I am not viewing this as some kind of flat, linear, single dimensional event.
Trust me.

My own filters are multi-dimensional also. :--}

And I have a pretty good idea what is exactly involved in a multi-DC updates,
except it is too technical for this level. We are already way too deep into technical details.

But again, the bottom line is this: It can NEVER jump from 3,000,000 down
to 5,000, no matter what.

Not possible under any circumstances.
You are clearly wrong - as it is happening.
That suggests that there is either a flaw in the reasoning, a flaw in the equation, or a flaw in the information you are using to base this all on.

And - before you argue that point - think on it...
IF the SERPs are showing these varied results - then it IS happening.
Irrefutable proof - correct?
Supplied by yourself - correct?

Therefore - it is possible.
Level 1

There ARE strategies to assure you are not going to have such drastic deviations.

Basically, either transaction completes,
or it does not.

If it does not, that means your system is broken and is in inconsistent state.
And if THAT is the case, NONE of your existing data could be trusted.

You would have to unwind the transaction and get to the state you were
before it started, and then reschedule it and do it again at a later time,
no matter what kind of resource bottleneck is there.


Transaction does complete, no matter how long does it take depending
on the traffic jams, resource starvation or all sorts of other things.
It all translates in your propagation delay (meaning, the length of time
it takes to complete a "grand" transaction for that particular time).

ALL other cases indicate a fundamentally broken system,
which can never come to a state of consistency.

and I am not going to tell you which is the MAIN one.

So, let us keep it in the same scope of reality.

The program on your box either works, or it does not,
at least as far as major functionality is concerned.

When you write to a file, you reasonably expect ALL the data to be written,
regardless of how small is your memory and how many threads you are
running at the moment. It ALL translates into prop delay, or a TOTAL deadlock,
such as you ran out of disk space and there is nowhere to write.

You rean out of memory and ran out of swap. So no more memory could be
allocated or swapped. You are in a deadlock state.

In a deadlock state G stops responding to ALL sorts of things.
Do you comprehend what we are talking about?
Correct. Only if you could comprehend what it translates into.
And that is about the last thing I would like to go into details about.

What I am seeing is valid data that corresponds to the OUTCOMES
of a set of events and conditions.

WAY out of normal scope of reasonably expected outcomes
under ANY conditions conceivable.
Just don't ask me anything else about more details.
I suggest you review the numbers in the very first post on this thread.
It tells you which numbers are correct and which numbers are incorrect.
And you appear to making assumptions on how Google handles the psuh/update process.
Is it fed from a single origin to all others
Is it fed from DC to DC in a predefined sequence
Is it fed from DC to DC in a varied sequence based onload/resource
Is it fed enmasse every time, to each DC
Is it fed enamasse some of the time, to each DC
etc. etc. etc. etc. etc. etc. etc.

Then you have to examine how they handle Flagging of issues.... if resources are bottoming, does it slow up or cease.
If it ceases, does it quit and have to restart, or does it quit and resume.
Does ceases affect others in the chain?

There are LOTS of potentials there.
Do you KNOW how G does it - or are you assuming?
Do you KNOW how G handles any encoutnered issues - or are you assuming?

And another spanner in the works - go back and look at your calculation regarding how much ins indexed and what the range should be.
Where in there does it factor the potential for removal of previously listed data?
You have 10 pages indexed.
You get betwen 2 and 5 indexed per day.
That gives you, within the next 24 Hours, 10-15.
Then factor in that G may have decided that 6 of the original pages were junk.
That means it may total 4-9 pages - thats less than you started with!

Then add in that you may be talking to a different DC - on that is a little "out of it" ... that may still show from the Last update period...
so that means you may have 5 to 8 pages indexed.

Now what happens if the "junk" data is sent ahead of the Recently Indexed data?
you may have even fewer pages listed... (technically it oculd be at -1 for a short period, yes?)


Is that making sense?

And I'm not convinced about the "24 Hours" either.
I've seen some sites take more than 3 weeks to get a change appear in the SERPs.
It dependso nhow "important" the site is in G's opinions.
So there is yet another factor to include as it is missing.... what happens if G decides that your site is not as important all of a sudden...
that means it may not only take longer for the update .... but you may get fewer pages crawled ... and possibly even less in teh index from that crawl!
AGain - you are assumptive.
How do you know which ones are corect.

Particualrly in light of the fact you are doign a search for a "term" and not based on a site.
will yield data for a speicifc Domain
will yield data from multiple domains.
That means you may be dealign with a damn site more variance.


I am NOT stating you are wrong.
What I am stating is I think you are not viewing it correctly, not taking into account numerous other potentials, and are generating skewed results.
(that doesn't make you wrong - that jsut means the examples provided are next to uselesss
Now personally - I've had enough.

I've given multiple examples of your equation being non-exhaustive and lacking in additional facotrs.
I've explained some of those other factors.
I've pointed out that some of your data is non-relational.

You are simply going to sit there andrefute it.

Do so at your leisure.

I personally cannot be bothered to waste any more time on someone who isn't going to realise the potential flaws ... and I doubt if anyone else will either.
So please - do sit here and continue to be assumptinve.
If you are lucky, someone from G may pop in.


Best of luck in getting an answer though.
Level 1

"And you appear to making assumptions on how Google handles the psuh/update process.
Is it fed from a single origin to all others
Is it fed from DC to DC in a predefined sequence
Is it fed from DC to DC in a varied sequence based onload/resource
Is it fed enmasse every time, to each DC
Is it fed enamasse some of the time, to each DC
etc. etc. etc. etc. etc. etc. etc."

Correct. All these things were considered.

"You have 10 pages indexed.
You get betwen 2 and 5 indexed per day.
That gives you, within the next 24 Hours, 10-15.
Then factor in that G may have decided that 6 of the original pages were junk.
That means it may total 4-9 pages - thats less than you started with!"

There are rules in transaction processing, which force you to introduce
only predictable amount of new data to the system so the whole system
still remains CONSISTENT. The principle of consistency is the MOST
critical principle.

So, at EACH stage of the game, no matter what kind of multi-dimensional
anything you can even begin to conceive, your system MUST remain
CONSISTENT. If, at ANY given point, it is inconsistent, it means DEATH
to the whole system.

That is why databases and SQL work.
Otherwise, all the computers and all the business on the planet Earth would stop.

In multi-dimensional systems, you always have a delta (a smallest part of transaction).
So, you introduce YOUR delta into some DC's context.
Once that DC updates itself with YOUR delta,
it then provides YOU with ITS delta.
At this point you are both consistent with each other.

So, yes, the issues are highly complex. Because there are not just 2 DCs,
but many. But at any given junction, there is either reconciliation between
one to one DCs, or ALL delta DCs against the MASTER DC, or reference DC.

So, transactions may be conducted via tripple way exchanges, in smaller
stages. But this is WAY beyond the scope of this thread.

Tell me, is there ANY way some developer from appropriate department
may get to look at this data?

Is there a develper's forum or something of a kind?
Finally, both lower and higher numbers ARE consistent,
which indicates we are not getting some random data in the middle
of transaction. Otherwise, we would be getting the random lower number.

But that lower number remains the same, within normal variation typical
of updating any single DC.

Level 1
Ok, I see what is going on.
It is a bug.

Anayway, thanks for your feedback.
>>> Ok, I see what is going on. It is a bug.

So what? In which way does this "bug" affect you, or me, or anybody else? In all those threads you opened for this crap I've not once seen any kind of explanation why it is you freak out over a couple of contradicting numbers? Why don't you stick to monitor and increase traffic instead of making a fuzz over something you don't have and never will have any control over?

Anyway, I find your examples quite questionable, and don't really understand what you want to say. Whenever you do this search:
(3.800 for the complete string or 1.030.000 for the string and parts of it, your site does by NO means have a million references *LOL* you better forget that erroneus newbee asumption right away *LOL*)

you're not querying for your domain or something, you're searching for the string "mfcgoldmine[dot]uuuq[dot]com" inside the content of your and other sites, whereas searching:
(61.000 pages)

you get the number of indexed pages as a result.

So ... where's the problem with it all?

there is no point discussing about numbers larger than 1000 returned by search info, because
they are an estimate,
you cannot go in search results beyond 1000.
Also the total number can be affected by
duplicate content
found by Googlebot at the time you do the search.
Autocrat is right,
specify a URL that is not indexed in search results, and that you expect to be indexed.
make your posts shorter, because with the way you write your posts
you make it very difficult to people to follow what are the problems you are raising.

Seriously peole - it's Not worth it.
I spent ages last night pointing out hte various holes/flaws/problems with the whole thing, the OP isn't interested in being shown they are wrong.
They want to be told they are right ... or that Google is wrong.

I suggest just boycotting this one and letting them get on with it.

(And thanks JM ;) )
Level 1
First of all, dear luzie, Autocrat, cristina and others, who get so angry,
I can just hear stomping of their feet and grinding of their teeth, let me ask you this:

Why are you SO upset about this, like Google's LIFE depended on it?

If you are so convinced in your totally erroneous and utterly inapplicable conclusions,
and I mean EVERY SINGLE ONE OF THEM, then just move on, and do something
creative for once in your lives, instead of engaging in insult, ridicule and outright

What is at stake here for you?
Why does it bother you so much?
Why do you need to attack someone with totally erroneous and insulting conclusions
that do not correspond to reality and hard facts and hard data as presented?

And ALL of you are "Top Contributors".
And ALL of you know the rules of conduct on these forums.

Does THIS kind of behavior create "positive user experience for everyone"?

I, thereby, make an official request to authorised Google personnel to take
an appropriate action to stop this uncalled for behavior. If these people behave
as they do, and this is not an asolated case, by ANY means, and there is
plenty of evidence on record, then the very motives of their participation on
these forums are suspect.

Furthermore, I would like to mention this:
Several threads related to specific issues related to this thread were removed.


WHO did a removal?

For what exact purpose?

Ok, let us look at one specific argument from your side:


(3.800 for the complete string or 1.030.000 for the string and parts of it,
your site does by NO means have a million references *LOL*
you better forget that erroneus newbee asumption right away *LOL*)"

(61.000 pages)

you get the number of indexed pages as a result.

So ... where's the problem with it all?"

Incorrect and TOTALLY irrelevant to the exact issue on the table.

The problem is that the search result can not possibly fluctuate as bad
as we see here.

First of all, even if search engine breaks the string
on 3 different tokens, still mfcgolmine, and especially cppgoldmine are
so unique that statistical chance that your search result will indeed be referring
to one of sites in question is probably well over 90%. But, in order to make a
definite conclusion, a detailed study needs to be conducted.

Secondly, there is no stemming involved here, if any of you know what it means.
If you don't, do a google search on search+engine+stemming.

But fine. Let use eliminate that component by performing a
LITERAL string search, that is "".

The CORRECT result for search on ""

Results 1 - 10 of about 1,030,000 for "". (0.32 seconds)

Still, EXACTLY the same as in original case. Furthermore, it is TOTALLY
consisted with ALL previous samples going back several days, which tells you

Snapshot of CORRECT result is:

For incorrect result, that is seen in over 98% of cases, you can perform a
search yourself.

The INCORRECT result for exact same search under exact same conditions
by simply pushing the Search button again is:

Results 1 - 10 of about 3,700 for "". (0.05 seconds)

This is TOTALLY inconsistent. A properly functioning system
can not possibly produce these absolutely astounding variations
of over 60 times for the exact same search under exact same conditions.

Search on ""

Results 1 - 10 of about 460,000 for ""

Clicking on Search button again produces this result.

Results 1 - 10 of about 7,200 for ""


ALL of 1,030,000 for "", within the reasonable
range, considering that this is an estimate, that could conceivably
vary within +/- 10%, DO EXIST in the index.

Therefore, the lower numbers are totally incorrect.

The consequences of such a behaviour on ANY kind of search are hard
to even begin to estimate without conducting a detailed study.
> Several threads related to specific issues related to this thread were removed.

> WHO did a removal?

> For what exact purpose?

Yes, it's a failing of the system that the remover is not named and that no reason is visible. I've asked for a change.

But it was me, because valuable volunteer time was being taken up to no useful purpose. And  my fingers are itching again.
Level 1
Correction: the search result for "" is not 60 times off, but 278 times!
And this kind of difference can never happen in a properly functioning system
under ANY conditions conceivable.
Level 1
Snapshot for CORRECT result for search on ""

All other snapshots of correct results are available and will be presented
in due time.

WARNING: If this thread is removed again, we are going to be dealing
with totally different set of issues and results may not necessarily be
the ones you are trying to achieve.
Level 1
Phil Payne,

Thank you for admitting you are the one who removed these threads.

As far as your exact statement:

"But it was me, because valuable volunteer time was being taken up to no useful purpose"


Especially condidering the fact that NONE of them have sufficient competence
in the issues of search engine internals. Otherwise, they would be working as
DEVELOPERS at Google making cool couple of hundred thousands of dollars a year.

So, there is no need to even bother.

As I repeatedly stated already: I am only interested in COMPETENT opinion
by someone who can even begin to understand these kinds of discrepancies,
and that is either top level developers at Google or architects that know the
exact mechanics of search engine internals.
Level 1
I would also like to mention the fact that it seems totally inappropriate
to allow non Google authorised personnel to be able to remove threads,
especially in the context of conflict of interest issues and the very fact
that the issues, addressed on these threads lay WAY outside of their
level of competence.
No and I don't need to recommend they stop, because they all seem to have taken that decision on their own.

As regards getting Google's attention - this week you pretty well won't.  The ones who normally monitor this forum are either out of their offices doing other things or on vacation - monitoring by Google is pretty much "on demand" only.

Even the ones that do monitor the forum regularly when they're here have pretty much given up trying to talk to you.  Nothing is broken, nothing needs changing, and even if it did you are only one of tens of millions of webmasters - why break something for ten million to please one? No one else - of all the millions - is having this problem. Believe me - Google is NOT goig to change ANYTHING to please you.

It comes back to the fact that you have a spectactularly poor, mostly copied and garbled set of interlinked sites that are of very little use to anyone.
Level 1
Phil Payne,

"Believe me - Google is NOT goig to change ANYTHING to please you"

It is not a matter of pleasing anyone. These results refer not to a single
isulated case, but to several cases, and there are reasons to believe
such discrepancies would cause similar discrepancies in other search
engine queries regardless of site.

As to your opinion regarding the quality of those sites and the information
they contain, it is simply inappropriate and not conductive to "positive user
experience for everyone".

Furthermore, it is totally irrelevant as far as exact set of issues discussed are concerned.

As I said before: I did not come here asking for the ways to improve either ratings
or performance. This is a non issue at the moment.
Level 1
Phil Payne,

Also, I'd appreciate if you answer this question:

Are you authorised to make statements on behalf of Google as to what IS,
or what is NOT Google will do as a result, as your following statement shows:

"Believe me - Google is NOT goig to change ANYTHING to please you"
Level 1
Phil Payne,

Furthermore, your very first statement on the original thread you admitted to have removed
(Question: Google index drops like a rock), is this:

"Top Contributor
Webmaster Help Bionic Poster
It's possible "selected from very large archives" is a euphemism for "stolen from the Library of Congress".

I, personally, find these kinds of questions HIGHLY offensive,
and DEMAND an explanation, that can possibly justify this kind of position.

Do you have ANY evidence or facts or reasons to believe that this collection
was either "stolen from the Library of Congress", or a subject of ANY kind
of copyright related issues?

Furthermore, your following statements on the original thread and on this one has
been successfully refuted.

"We don't need to prove it.  Simple numeracy tells us this is an unusual phenomenon. There are around sixty million indexable domains on the planet.  If 1% hist such problems we'd have 600,000 aggrieved webmasters posting here and in every newspaper that's printed.

The simple fact is - you're all on your own.  One in sixty million is fifty times less likely than winning the UK's national lottery."

You can not possibly prove such an assumption and it is TOTALLY invalid to begin with
as has been described in original thread, which YOU, personally, have removed, and they are,
and I quote:

"Not necessarily. First of all, they were all told and MANY times over,
to the point of being zobified, that Google does NOT guarantee anything
and their index MAY and DOES fluctuate, no matter what they thing is

Plus, how many webmasters do you think are willing to rock the boat
and tell the business owner he is loosing millions because of those
wild fluctuations? Why would anyone in his clear mind do that?
You see, it is much profitable just to keep quiet and pretend you did
not see any of it, cause there is nothing you can do anyway,
instead of rocking the boat and possibly loosing his job,
if boss learns that he lost millions "because of this clueless bozo,
who calls himeslf SEO".

Get the picture?

Finally, how many webmasters even participate on these forums?
According to the way you do YOUR statistics, it is less than
0.00000000000000000000000000000000001 % of all the
webmasters in that huge ocean called Internet."


"How many web masters even KNOW they are having these wild fluctuations?

How many of them keep the constant watch of their google index?

How many of them take regular snapshots of their statistics,
at least as to the number of total pages indexed by Google
from your sites?

Does google provide the charts of total number of pages indexed
by google from your site even on Google Analytics?

How many webmasters do you think can produce a running
report of their total pages indexed and post it here?

Would you like me to produce one for you and see if YOUR
is as good as mine?"

And you had no argument on it whatsoever.

This is one more chance for you to prove your point
in order to restore your tainted reputation.
Level 1
Phil Payne,

As to your statement:

"It comes back to the fact that you have a spectactularly poor, mostly copied and garbled set of interlinked sites that are of very little use to anyone."

It is simply outrageous. Simple as that, and the EXACT information has been provided to you
on threads you have removed.

These sites happen to be the REFERENCE sites for a number of Universities
and other educational institutions.

These sites are REGULARLY visited by the biggest software houses in the world,
such as Microsoft, Sun, Intel, HP and other biggest names in the software,
hardware, business, banking and finance, leading world manufacturing corporations,
goverments and even military.

These are probably the cleanest sites on the net as far as producing PURE content
without a single ad and on cleanest pages, that do not have ANY kind of visual
garbage, whose sole purpose is to milk their sites for ad revenue, which is exactly
why MOST of the sites on the net contain very little information on a page as a ratio
of useful information to total size of a page.

Some of the "top ranking" sites contain less than 10% of on-topic, useful information
on every single page. It goes as far as having 2-3 sentences reating to the issue and
Topic, and pages worth of all sorts of advertising spam.

There is PLENTY: of reasons to believe that the top ranking sites, at least as far as
issues covered by Goldmine collections go, are in fact the biggest spamming sites
there are.

Vast majority of information they provide is nothing more than a marketing spam.

This particular issue is one of the central issues of Goldmine collection organization.
The article pages contain NOTHING but exact information, extracted with the most
sophisticated filtering technology that exists at this junction, and are GUARANTEED
to correspond to a chapter Title with WELL over 90% certainty.

The amount of most useful information and relationship between useful, hard to find,
competent, etc. information and its ratio to the total number of articles on a given
chapter's topic, is probably the highest you can find ANY place on the net in the
context of similar information.

The amount of practical code examples and snippets on subjects covered by the
collections is simply unprecedented, which allows one to find the answer on ANY
conceivable issue or most difficult problem one might have, is simply unheard of.

The VARIETY of examples, views, expert opinions on ANY given subject or topic
reflects the best of the best, the state of the art and is probably the most valuable
collection of similar information existing on the planet Earth at this particular junction.


Well, because of a simple fact:
"If we don't have it, it probably does not exist".

Add to it: and if you find a more precise collection of similar information ANYWHERE
on the net, that includes this kind of coverage, depth and precision, including, but not
limited to Google's own collection, considering the ratio of valuable/total information,
Microsoft, Sun, Intel, IBM, or you name it, I would be curious to see your references.

VAST majority of similar collections are simply a garbage dump, that contains every
single article regardless of its appropriateness to a given subject or topic.

The chances of you finding truly useful information in that garbage dump are less
than 1% in most cases, if not much worse than that.

So, I find these kinds of remarks by TOTALLY incompetent individuals,
such as all those, foaming at their mouth and throwing around all sorts of mud,
insults, harassments, humiliation and ridicule, totally off base, totally ungrounded
and totally incompetent.
Level 1
All articles in collection arvives are guaranteed to be unique with 100% certainty.

All articles in any chapter are guaranteed to be unique with 100% certainty.

SOME articles may appear in more than one chapter if the issues covered by
that particular article are DIRECTLY applicable to a different chapter with well
over 90% certainty, which is unprecedented for similar collections on the net.

ALL article pages are validated under the strictest HTML standards possible,
and that is HTML 4.01 Strict.

Yes, as a result of recent changes one or two articles out of average 50,000
articles in these collections do indeed have validation errors, that are,
nevertheless, do not affect the page rendering or the ability of Google bot to
index these collections to FULL extent. There is a guaranteed path from the
top level index page to every single article in collection, regarless of what kind
of browser is used and wether CSS is enabled.

There is no link stuffing, hidden text meant to be exploited in order to artificially
cause the page rank to go higher, or any other tricks used to artificially inflate
the ratings.

As far as "farms" go, the argument of the opposing site is totally invalid and
has been explained in the articles removed.

Each of these collections has at least one mirror site, which is 100% duplicate
of the original site. It has exact same index pages, exact same articles and
exact same everything.

There is no benefit to having a mirror site in terms of any kinds of rating.
No matter how many mirrors are there, there is only one article that can be
viewed, and it does not matter from which mirror. The page view count is
not going to go higher just because a particular page was accessed from
a different mirror.

Mirrors are used extensively on the net to increase reliability and decrease
the traffic load.

Web has an inherent problem related to single point of failure.
If page is delivered on a single site, then any kinds of attacks on that site
can cause the whole information library to be off line.

Since these collections are used by professional programmers 24 hrs./day,
and those programmers have the toughest issues to resolve in the shortest
possible time frame, it is IMPERATIVE these collections be protected by
reliable mirrors. Our mirrors are some of the most reliable mirrors on the net
and page load time is one of the best in the industry for some very specific
reasons, that INHERENTLY make these mirrors the most reliable mirrors
possible, because they do not allow ANY kind of executable content on these
sites, including, but not limited to PHP, any kinds of scripts, shell access, etc.
The ONLY thing allowed on these mirrors is the simpliest non executable ssi
statements, and THAT is one of the reasons these are some of the most
reliable sites on the net.

Furthermore, Google bot can EASILY discover that these mirrors are in fact
mirrors and not just some tricks to inflate the ratings.

First of all it, is site names:

MFC/VC/ATL/STL collection:

C++/Visual C colletion:

Java Collection:

Javascript collection: (this one does not even have mirror)

On the top of it, ALL of these collections clearly belong to the same
Google webmaster account. There are no tricks used to hide ANYTYING.

So, Google has MULTIPLE ways and means to distinguish the essence
of "interlinking" as far as any conceivable aspect goes.

Finally, if Google decides to penalize the valid mirrors,
the net effect on the Internet will be disastrous.

First of all, it will drastically reduce the availability of most collections
of information and all sorts of distribution channels. Some of the most
valuable resources on the net will simply become inaccessible as they
will be dropped from Google index, which, by now, is the biggest resource
on the net and is recognized as a #1 choice for all searches on the net.
Demand all you like sunshine.

You've been told your logic is faulty.
You've been told that you are not including all the relevant factors.
You've been told that your method of examining references is incorrect.

If you are too damn stupid to accept all of that - from multiple people ... and to jest get on with it- that's down to you.

Do NOT expect anyone else to make the effort to aid you.
do NOT expect anyone else to bother giving you any attention.

if I see you making multiple Topics about the same STUFFING THING - I'll * well delete/report!

Am I Clear?

(And I'm not kidding - I'm sick to damn death of your whinging and refusing to acknowledge your wrong - but that is your choiuce.  What I don't have to put up with is you trashing this forum/community, nor upsetting other posters or regulars whilst you're being stubborn!)

Level 1
I find these kinds of remarks totally off base and not in line with
"positive user experience" issues and guidelines.

As for the "link stuffing", giving the exact information in the argument
related to site interlinking, so everyone could see exactly what we are
talking about, it simply looks strange, especially considering the fact
that these exact URLs already appear in the same thread, only in a
different context.

Finally, again, there is no need to get upset to the point of blowing up.
If you, personally, do not find this thread of interest to you, there is no
need to interfere or even bother about it.

The issues ARE valid and specific evidence was provided, and,
as been stated before, not a single opposing argument so far does correspond
to the exact issue being discussed in this thread.

Just relax. Why be so worked up about it,
especially if it has nothing to do with your problems?

Or do you have some kind of vested interest in this information being suppressed?

You reactions seem to be too strange, too overboard and too uncalled for.

Does anyone bother you?
Does anyone asks for YOUR particular opinion?

You presented your position and it has been reviewed and evaluated to full extent.
Level 1
Nice picture. I think some of the top programmers in the world that use the
Programmer's Goldmine collections would find it pretty entertaining and
representative of the "pleasant user experience for everyone" slogan.
Level 1

Since YOU have posted this picture, you probably know what is the meaning of it.

I would just like to ask a question:

Why did you put up a picture of young African amerian guy,
whose hands could be tied behind his back,
and who has a foot ball stuffed into his mouth?

Is it the message to all people?
Is this in your opinion on what Google thinks about its customers and users?

Interestingly enough a couple of "Top Contributors" even clicked on "Yes" button,
next to "Do you think this answers the question?"

You guys seem to have plenty of sense of humour.

Does it make you feel better about yourself?
Like you are some kind of "elite", who had the authority not to only insult,
ridicule, humiliate and harass the people as you do here all the time it seems,
but, for some strange reason, is even allowed to delete the other people's posts!

An unusually broad authority given to you by Google, I'd say.
I am just curious, how does Google select its "Top Contributors"
and what are their authorities and relationship to Google in general?

Are some of you PAID for your noble efforts to help those "clueless"
webmasters all day long?
The issues ARE valid and specific evidence was provided, and,
as been stated before, not a single opposing argument so far does correspond
to the exact issue being discussed in this thread.

Hello ?
This is yopu early wake up call from REALITY!

As stated (now several times) your methodology is FLAWED
Scroll up
Look at hte response ticked as Best Answer.
Look at who+what ticked it.

A Google Employee has visited.
A GE has made a judgement.

Take the * hint!

Seriously - if you want to go on
and on
and On
and ON
about this - I suggest signing up for an account at DigitalPoint.

Level 1
And webado and Becky Sharpe think consider it "Best answer"?

Impressive, I tell you.
Level 1
Ok, will do that. Let me see here...

Autocrat, Becky Sharpe, webado, Phil Payne, cristina, luzie, Kevin-UK
think this is the "Best answer".

Basically ALL the "Top Contributors" think the same way.

Well, I guess the majority opinion DOES define Truth.
If everyone thinks the Earth is still flat, than it MUST be!

If everyone thinks that the planet Earth IS the centre of the Universe,
than it MUST be.

Otherwise, they would not burn some guys for making such proclamations.


There is a very little but:

You see, properly working systems I know of, and I know plenty about that stuff,
do not work like this. Otherwise... Everything is nice and kosher indeed.
Level 1
But, it's been a pleasure to hear some enlightening views as to Google internals,
the distributed nature of DC (Datacenters), the procedures and principles of
transaction updates, roll-backs and all sorts of other useful things, including
the difference between the and ""
searches, that, for some strange reason, showed exactly the same result
Let me correct you:
If we all think some guy is a delusional moron, by Jove, he is indeed that!
By the by, the picture I posted was a visual depiction of my advice to you to put a sock in it! Get it?
Level 1

Not only "delusional", but "moron"?

Hey, that IS "conductive to positive user experience for everyone",
especially if all of you, Top Contributors think this way.
> By the by, the picture I posted was a visual depiction of my advice to you to put a sock in it! Get it?

It does actually have the search argument you used on the right.  Not rocket science.
You can make out that you know what you are talking about as much as you like.
But anyone reading this (sympathies to them) who see's the Searches you were making will KNOW you are clueless.
Anyone examining your method of calculation will also spot the numerous screwups, miscalculations, lacking factors etc.

In short - no one is goign to think you are smart nor knowledgable on this.

Too many errors, to many assumptions.
I'm not a data engineer - and I can see how far fetched your approach is.

I honestly pray a Google Engineer pops in on this.

Just so you can shut your * cake-hole
Level 1
Well, interesting results.


Results 1 - 10 of about 5,850 from for threads


Results 1 - 10 of about 1,250 from

I wonder if anyone can explain THIS one.
From what mere mortals would probably conclude, is that there are more articles
in a single chapter as SERPs show, than in entire site.

Which one is right in this case?
Are they BOTH right?
One is right and the other one is wrong?
BOTH wrong?
None of the above?
And ALL of above included?

"And now, the Truth has been spoken"
-- Sankaracharya, India 5000 B.C.
This pointless post is preventing other more important posts from ppl with GENUINE issues from getting exposure in the help forum.

Level 1
Here is another nice one:

(just like one of Top Contrubutors suggested)

Results 1 - 10 of about 4,850 from (0.18 seconds)


Results 1 - 10 of about 1,250 from

Well, more articles in a single chapter than on ENTIRE SITE!!!

Level 1
This one is also pretty good:


Results 1 - 10 of about 625 from (0.08 seconds)


Results 1 - 10 of about 1,250 from

Well, just one chapter contains half the articles on the entire site.

Sorry, but there are more than 100 chapters on the site.
Level 1
So is this one:


Results 1 - 10 of about 684 from (0.05 seconds)

Again, just one chapter contains more than half the articles on the site!
Level 1
And this one looks just as good:


Results 1 - 10 of about 1,040 from (0.27 seconds)

Well, about 90% of the entire site is in one chapter!
Level 1
And so is this one:


Results 1 - 10 of about 1,110 from (0.19 seconds)

But the whole site has only how many articles?

Results 1 - 10 of about 1,250 from
Level 1
And how about this one:

Results 1 - 10 of about 1,430 from (0.12 seconds)

Sorry, but the entire site has only 1,250 articles...
Level 1
Well, this one did not quite make it to be larger than the whole site.
Sorry about it.


Results 1 - 10 of about 1,020 from (0.19 seconds)

Now, they all seem to be competing to beat the whole Mother Site!
Level 3

My God!

They are all trying to beat the whole site!


Results 1 - 10 of about 1,670 from (0.14 seconds)

Beats the whole site EASY!!!
Level 1
Shall we try the cppgoldmine?

That one is going to be a knockout!
Level 1
Now, WHO said "They probably NEVER existed" or something of a kind?

Lemme see here...

Oh, I see, familiar faces, Top Contributors, those same people, that for some strange reason
delete a perfectly valid thread after all "Top Contributors" had enough fun insulting, humiliating,
harrassing and abusing.

Interestingly enough, the Google employee came and told me something along the lines:
"be nice, these are nice forums, and all nice user experience should not be interrupted".
Sure, this is not a literal quote, but we can dig that one up easily.

Anyway, here we go, and I quote:

Phil Payne
Top Contributor
Webmaster Help Bionic Poster

"> But can you tell me the possible reason for such a rapid blips in index (from 1410 to 59,600 and back to 1060, all in one day?

It probably never happened.

I see 15,400 pages."

Yep, I know what you see in 98% of cases from my statistical estimates.

Now, can you add those few chapters above from that single site and
Level 1

Just relax. This isn't exactly the end of the world.
This issue HAS been discussed already and I did mention it more than once
that these are EXACT same sites. Every single site has at least one mirror,
and that is exactly as I want it, except it wouldn't be a bad idea to add more

This is used for reliability purpose. EXCLUSIVELY. There is no benefit
whatsoever from having a mirror. Because when user finds some article,
it does not matter which mirror he is accessing. The total page impression
count is not going to change.

If anything, Google has to make sure it can handle thse kinds of situations
without penalising the mirrors all over the world.

Secondly, this is the way it has been for almost a year, and there has never
been any kind of problem as far as sites being properly indexed by Google.
This is a recent phenomenon.

Google can EASILY determine that these are in fact mirrors and there is no
trick of any kind used to bump up anything.

Furthermore, ALL these sites were created via the same exact account on
webmaster pages. Nothing was done to even attempt to "cover up the tracks".
The simpliest domain name analisis shows that these are exact same sites.
The index pages are exactly the same, and so is all chapter index pages
for every single chapter, and so is every single article in each of these sites
(assuming we are talking about the same collection) is exactly the same.

And if you think THIS could be considered as some kind of "evil", then what
is going to happen is some of the biggest depositories of information in the
world are going to just dissapear from Google index, and the consequences
of it one can not even BEGIN to comprehend.

In summary, there is absolutely nothing "wrong" with having the mirror sites.
Google has been given so many clues as to what they are, that it is the
simpliest thing in the world to make a proper adjustments in Google index.

Furthermore, Google ALREADY has provisions to handle this kinds of situation
via "Similar pages" links that are indented.

What happens if someone does a search and clicks on a link and that site is
off line for all sorts of reasons? Well, then they have an option on clicking on
the mirror link if he thinks this article has a potential to answer his question.

But ALL of this has absolutely nothing to do with the issues on the table.
Again: the issue is how come Google SERPs show WILDLY different results,
as much as by 1000 times for the same exact search under same exact conditions,
from same exact machine? Just by clicking on Search button, you may get either
3,000 articles or 3,000,000. How THAT kind of thing is possible?

And how come those results are stable over the period of weeks
withing the reasonable and expected normal variation as various DCs get updated

As to the "questionable formula", you need to keep things in perspective.
We are talking about the SAME exact conditions and the same exact search.
And I insist: the results should NEVER, under ANY conditions, vary so wildly
in a properly functioning system. REGARDLESS of what kind of DC update
is going on, and what is the views from various DCs as to the same exact site.
Yes, there ARE some normal variations. Because all DCs may have a different
data at exact same time and are not synchronized sufficiently. But that has been
taken into account in my analysis.

Simple as that.

Finally, this "sunshine" language is clearly offensive, just like every single response
of yours. It is getting kinda boring. Can you come up with something more creative?

One more time: to this very moment, you were unable to present a SINGLE argument,
that has ANYTHING to do with the exact problem we are talking about.

Level 1

We are not discussing the copyright issues on this thread.
If ANYONE has ANY copyright related issues with these sites,
they know what to do, better than you.

Do YOU have a problem with copyright issues?

Why don't you listen carefully what is being said?

And that is if you can even manage to talk to Google to begin with.

And you know what? Surprise, surprise.
Level 1

Thank you for answering my question regarding relationship to Google.

That means to me that you are not authorised to make ANY kinds of
presentations on behalf of Google, its search engine mechanics, or
search engine internals.

ALL your opinions is just that, opinions, based on rumors.

Otherwise, I could just look the exact description from some Google page
that describes the mechanics in no uncertain and very specific terms.

Do you happen to have a link to such a page?
Top Contributor
Webmaster Help Bionic Poster
1 person says this answers the question:
And again - you are showing that you have not got a Clue.

Seriosuly - Go and PAY a profssional.
Go on.

I'm sure after having someone explain ALL of the above too you for money - you swill start to pay attention.

No - I'm not authorised - but I'm clearly more knowledgabe on the subject that you are.
Do you think this answers the question? Report abuse
Level 1
My poor Autocrat,

Why did you have to waste all this time doing the most useless thing possible,
and that is trying to "prove" that these are mirrors? This has been already discussed,
and not only once.

Secondly, it has absolutely nothing to do with the exact issues on the table,
and claims, supported by the hardest evidence there is, the screenshots,
at least as my ability to access the Google internal information and databases.
Level 1

I do not deny that you have plenty of useful experience and know all sorts of tricks
that are  useful from SEO standpoint, and I am not questioning any of it, or trying
to discredit the value of what you do, regardless of what your underlying motives are.

I am not saying you are just an idiot that does not have a clue.
At the same time, I do not see you even looking at the data from appropriate angle,
nor you truly understand what you are saying as far as search engines mechanics go.
Level 1
Just to keep things in proper perspective.

Without bragging too much as to how "great I am", I can tell you this much:

I do understand what I am saying, and I do understand what means distributed systems,
synchronization issues, database theory, transaction processing, the nature of information
and search engine internals, the principles of file system designs, and PLENTY of other
things that ARE applicable to the exact issues we are discussing here, down to the nitty
gritty of it.

Hope you can appreciate that.
Level 1
Just to put a fine touch to all this "you are just dumb" things you are constantly blabbering,
I can tell you this: I consulted some of the biggest names in the industry, such as
HP, Intel, SGI (Silicon Graphics), Amdahl, Fujitsu (HAL Computers), Lynx and a few
others on a contract basis, doing the kernel level development work and being paid
the money you can only hope to ever make, and all the projects I did for them,
were done on time and with the best results possible, and in MOST cases, results
that exceed their normal expected results several times over.

Hope that rings at least some bells in your cockpit.

So, lets keep things i proper perspective.
Level 1
Just to put a fine touch to all this "you are just dumb" things you are constantly blabbering,
I can tell you this: I consulted some of the biggest names in the industry, such as
HP, Intel, SGI (Silicon Graphics), Amdahl, Fujitsu (HAL Computers), Lynx and a few
others on a contract basis, doing the kernel level development work and being paid
the money you can only hope to ever make, and all the projects I did for them,
were done on time and with the best results possible, and in MOST cases, results
that exceed their normal expected results several times over.

Hope that rings at least some bells in your cockpit.

So, lets keep things i proper perspective.
Level 1
Have no idea what happened with this duplicate.
I got "Bad Request" message on initial Post reply action.
Top Contributor
Webmaster Help Bionic Poster
1 person says this answers the question:
Jsut keep on going.
At the end of the day - this tpi ranks for your sites.
Everyone can see what you are like, and will realise that you re clueless.

If your intent was to harrange Gogoel and make a name for your self, congratulations - you partiall succeeded.
Rather than causing Google a nightmare - you got us.
Yet you did make a name for yourself - and it's quite lng and unpleasant.

You are a laughing stock.
You complete inability to understand all the above points that can/do influence the results seen, and the various incorrect assumptions show you as nothing more that an idiot.

so please - keep it up.
It's your reputation, your sites etc.

None of us care.

I've been the only one to make a serious effort ot assis t you - and I'm tired of it.
So now you can sit and suffer.
Watch your results fluctuate, vary, and Dwindle (as they will!) over time.

I know I'm going to enjoy watching it :D
Do you think this answers the question? Report abuse
Top Contributor
Webmaster Help Bionic Poster
1 person says this answers the question:

Sorry buddy, but that's NOT how you are supposed to implement a distributed system.


You don't allow multiple sites to be indexed for the same thing.


One site only will be indexed and you direct traffic by a proper load balancing method to whichever server happens to be available, less busy, etc, in a totally transparent manner to the visitor (be it human or robot).


Of course this is not a cheap method. It's not at the disposal of every Tom, Dick and Harry to use.  I cannot afford it. If my server is down or too busy, I can only wait until it's sorted. I don't have the resources to go fancy. But I would never set up mirror sites and let them all get indexed, this is idiotic in the extreme. You're cutting your own branch by introducing so much superfluous material that all of it will be trashed to varying degrees. Think quality vs quantity. Think long and hard about that.


And this doesn't even begin to address the other problem, that your published content even if it existed on only ONE of your  sites, ALSO exists verbatim on may other sites on the web which are not controlled by you (if they were this will just add to the same issue as your mirror sites have). You are aggregating content from many sources. This therefore means it's not original (not original to your site), not unique and ultimately doesn't deserve to be indexed in the minutest detail and even less to rank at all.

Do you think this answers the question? Report abuse
Top Contributor
Webmaster Help Bionic Poster
Wasting keystrokes there ...webado... :(
Do you think this answers the question? Report abuse
Level 1

I do understand what you are saying.

But there is a pretty interesting twist on all this.

The issue is pub domain information. Who, any by what criteria, is to be included
in the index if ALL of them have the same articles? Well, according to Google,
the biggest and the baddest. But do they "own" ANY of it?

Secondly, consider this: Most, if no all the sites I am aware of that cary the
same exact information, simply have it in bulk. Searching THAT database
will produce pretty poor performance in terms of appropriateness of some
article to the search query, and the very quality of the articles that show up
in some search if some particular site has priority over all others.

For example, if you are interested in "abstract class" issue and would like
to see some specific code examples describing some pretty subtle points,
when you do a search, you may get ALL sorts of articles on abstract class,
and about 99% of them will be totally of the base as to the original query.

But... If you do not simply include the entire historical archive on the site,
but filter the information with sophisticated filters, what YOU are going to
have as a result is ORDERS of magnitude more precise and to the point
information than anyone else, including Google, Yahoo and Microsoft.

You may never even get to be indexed if Google considers it a "dupicate content",
despite the fact that you have the best collection of most on topic articles with
the most extensive collection of code examples on ANY issue conceivable
that exist on the face of the planet Earth at this particular junction.

And I can tell you more: Google itself can not possibly produce anything equally
as precise, no matter what they do. Because their filtering technology is totally
outdated, and in the most fundamental principles. They use methods and techniques
that were known since the sixties of the previous century. Yes, they did improve
some things, but in incremental terms. They can not possibly find the information
we can find, and for them to be able to do that, they would have to spend several
years on totally changing the entire architecture. Because very little of their
current technology is capable of utilizing these principles, which, for obvious reasons,
I am not going to go into details discussing it.


What IS the benefit to current view on information?
WHO is to be shown in the search results and according to what criteria and principles?

Yes, this issue does have relevance and there is no easy solution, even from logical
standpoint, and yes, there is no point of including every single copy of the same
article. Otherwise, the users will simply be overwhelmed.


This is not the end of equasion. It is just a beginning of it.
Just take for example the simpliest issue of article formatting.
If you look at the biggest players in the world, their article formatting
is not necessarily the best you can find. With their current formatting,
it is much more difficult to see who says what and who responds to whom,
etc. Plus, just the pure formatting goes, they just copy the original article
with all those weird special characters to the output page. If you look at
exact same article in my collections, the difference is night and day.

And again, if you do a search on code examples for abstract class in
Microsoft or Google database, you are going to get hundreds of thousands of articles,
majority of which does not even have to do with abstract classes as such.
Because their very technology is not capable of doing some fine and sophbisticated

The very traffic on the goldmine sites is a living indication of validity of such approach.
Penalising these new technologies using the methodology of dealing with crooks, can
hardly produce the benefits.


Makes sense?

I bet not!
Level 1

I am done with you. You had you chance and I did look at your arguments.
I doubt you can produce anything else beyond of what you already did.

So, spare your grief. Move on to better things.
Level 1
