I have several large sites, each with 30,000 to 70,000 articles.
For some unknown reason, during the last two months,
the Google index was steadily declining until it felt from more
than 20,000 articles/site indexed to around 500.
Sites have been up for more than a year and all sitemaps were submitted
and accepted by Google without problems several months ago.
At high point, sites were indexed to about 20,000 - 50,000 pages per site.
Considering there are older versions of the pages on the sites, the correct
index count should be over 50,000 pages until Google removes old versions.
Recently, it has been discovered that Google search does not produce the
correct number of indexed pages on SERPs.
For example, for site http://mfcgoldmine.uuuq.com we see following results:
Search on site:mfcgoldmine.uuuq.com:
1,250 from mfcgoldmine.uuuq.com
But, once in a while you do get the correct number of pages indexed:
61,900 from mfcgoldmine.uuuq.com
So, if you are doing a search, the chances of you getting almost 40 times
lower count of indexed pages than Google actually indexed, is over 98%.
For example:
Site: http://mfcgoldmine.by.ru
When you do a search on site:mfcgoldmine.by.ru here is what you get:
And here is the stats for all sites in question:
Here are the stats for all the sites:
http://mfcgoldmine.uuuq.com
Search on site:mfcgoldmine.uuuq.com
61,800 from mfcgoldmine.uuuq.com - correct result
1,160 from mfcgoldmine.uuuq.com - incorrect result
(off by whooping 40 times)
Search on mfcgoldmine.uuuq.com domain, without site:...
Results 21 - 30 of about 1,030,000 for mfcgoldmine.uuuq.com - correct result
Results 1 - 10 of about 3,360 for mfcgoldmine.uuuq.com - incorrect result
(off by whooping 250 times)
Site http://mfcgoldmine.by.ru
Search on site:mfcgoldmine.by.ru
37,900 from mfcgoldmine.by.ru - correct result
1,490 from mfcgoldmine.by.ru - incorrect result
(off by 20 times)
Search on mfcgoldmine.by.ru domain, without site:...
Results 1 - 10 of about 826,000 for mfcgoldmine.by.ru - correct result
Results 1 - 10 of about 4,460 for mfcgoldmine.by.ru - incorrect result
(off by 200 times)
Site http://cppgoldmine.uuuq.com
Search on site:cppgoldmine.uuuq.com
32,800 from cppgoldmine.uuuq.com - almost correct result, but lower than it should be
2,880 from cppgoldmine.uuuq.com - incorrect result
(off by 10 times)
Search on cppgoldmine.uuuq.com domain, without specifying site:...
Results 1 - 10 of about 543,000 for cppgoldmine.uuuq.com - correct result
Results 1 - 10 of about 4,320 for cppgoldmine.uuuq.com - incorrect result
(off by > 100 times)
Site: http://cppgoldmine.by.ru
Search on site:cppgoldmine.by.ru
25,400 from cppgoldmine.by.ru - almost correct result, but lower than it should be
315 from cppgoldmine.by.ru - incorrect result
(off by almost 100 times)
Search on cppgoldmine.by.ru domain, without specifying site:...
Results 1 - 10 of about 513,000 for cppgoldmine.by.ru - correct result
3,060 for cppgoldmine.by.ru - incorrect result
(off by 150 times)
Site http://javagoldmine.uuuq.com
Search on site:javagoldmine.uuuq.com
10,400 from javagoldmine.uuuq.com - incorrect, but closer to reality (which should be > 40,000)
3,190 from javagoldmine.uuuq.com - incorrect result
Search on javagoldmine.uuuq.com domain, without specifying site:...
Results 1 - 10 of about 459,000 for javagoldmine.uuuq.com - correct result
Results 1 - 10 of about 6,200 for javagoldmine.uuuq.com - incorrect result
(off by > 70 times)
Site http://tarkus01.by.ru
Search one site:tarkus01.by.ru
71,200 from tarkus01.by.ru - correct result
15,400 from tarkus01.by.ru - incorrect result, but better than other incorrect results
Search on tarkus01.by.ru domain, without specifying site:...
Results 1 - 10 of about 471,000 for javagoldmine.by.ru - correct result
Results 1 - 10 of about 13,600 for tarkus01.by.ru - incorrect result
google does not index all pages of a site and it can drop pages over time.
there can be many reasons for pages being dropped and without looking
at your site and going through te articles.
Are your articles unique content. Most article sites if they take article
submissions from the public will have major problems with duplicate issues.
this is because people submit their articles to many different sites.
You need to go through your sites using something like copyscape to find
out if people are submitting original content.
Kevin
Do you think this answers the question?
Report abuse
Top Contributor
Webmaster Help Bionic Poster
7/4/09
1
person
says this answers the question:
1) Google is not obliged/required to crawl/index anyting
2) Google will typically only crawl/index a set % of a site anyway
- the % may vary based on things like Trust, Authority, Popularity,
Internal Link structure, server responses and time length for responses etc.
3) Crawling does Not = Indexing
4) Indexed does Not = shown in SERPs
5) Google /may/can/does Filter results in the SERPs ...
It may decide Not to show some URLs if it sees them as Duplciates
(full/partial - Internally/Externally and/or due to Canonical issues).
It may decide Not to show some URLs if it perceives them as being "weak"
(little/no content, liitle/no original content, no intenral links, poor response times,
poor response history etc.).
6) Results given may vary based on DataCenters - G's info is on multipel networks
- the DC you speak to may change based on the response speed of the DCs, your ISP,
your Browser, the time of day etc.
7) The figures shown by G tend to be "estimates" or "guesses"
- as you click through the Pager Links at the bottom, the figures tend to change.
(It may say "of about 5000" on page one, go to page 25 and it may say
"of about 1200" and if you go to page 79 it may say "of about 800")
8) You may find that Google is actually "consolidating" it's figures.
The figures you saw before could have been wild guesses - but now G has had time
to properly crawl the site, it has realised that it actually on has X pages,
and never had Y pages in the first place!
9) The only way o have a better idea (but still NOT likely to be 100% accurate!)
is to click through ALL the pager links!
Due to the site of the site, using a site:operator is likely to be ineffective
- isntead you should use the domain plus a Directory, or possibly even a SubDirectory...
site:yoursite.com/directory1/ site:yoursite.com/directory1/subdirectory1a/ etc.
Do you think this answers the question?
Report abuse
ok I've had a quick look at the site. you have problems with the way you
are running the site - you have www and non www versions for the page and
you also offer a frames version and a non frames version of the site.
You have deifinate duplicate content issues as your site is providing
a lot of sample code that is found elsewhere on the web and in a much more
user friendly (and search engine friendly) manner.
I would suggest you actually look at the why you are delivering the site
to see how you can make it much smoother and user firendly -
and then you need to deliver some unique content.
Kevin
Do you think this answers the question?
Report abuse
Well, the issue is not wether things go up and down. I don't even question that at the moment.
The issue is why the numbers jump by tens and even hundreds of times?
Articles are unique with exception of some articles being relevant to more than one
chapter, in which case it is included in more than one chapter. But within each
chapter articles are guaranteed to be unique. It is hard to estimate the exact number.
Sure, there are some things that need to be polished, but it is a matter of priorities.
Right now the main priority is identification of the issues related to these dramatic
jumps. Once we figure that one out, we can go and take care of some fine points.
I am not even concerned with optimising sites to get the absolute best in any kind
of ratings or trying to get into #1 position. There is sufficient traffic and it is steady.
But it did drop by 3 times while the number of articles indexed dropped by 30 times.
Sorry, can you gimme a reference to that page?
I just checked for that one the other day. Was ok. Not sure it is the same site
you are looking at.
"and you also offer a frames version and a non frames version of the site."
Well, I thought this is a non issue because they are both accessible from
the 2nd level page and site is crawlable via non frame version of the site.
Actually, the chapter index is exactly the same for both version. The only
thing I can think of to be a problem is that it uses "target", and I was thinking
of creating a separate index page that does not use target, if that will change anything.
All we can do is reduce the chances of losing pages in the index and the web
is a dynamic place. duplicate content is the biggest reason for pages being
dropped but there could also be time facor involved (though whether 1 year
is enough for that I'm not sure)
One of my orignal sites has over 3,500 pages but google now only has
290 pages indexed and that's a site that's been around well over 10 years.
I'm quite sure that somewhere in google index there's a popularity factor
whereby if no one calls your page for x number of years/months etc it will get
dropped because even google can not hope to store everything that nobody wants.
Indexing can go up and down and I'm sure that is related someway to
what people are searcing for.
Kevin
Do you think this answers the question?
Report abuse
Well, I do not even expect indexing to be steady as a rock. Obviosly.
It did fluctuate during these 6 months, and quite a bit, which is understandable.
But what I am after, is seeing these jumps by tens and thousands of times
doing the same thing. And there is something very consistent in those jumps
and it IS reproducible.
What happens is this: when you do a search, most of the times it gives you
that lower number, say 1000 pages. But if you use > link on SERP, at
exactly the 4th page, and CONSISTENTLY on the 4th page, it will, all of a sudden
jump tens, if not hundreds of times, say to 10,000 pages.
AND, on the top of it, even if you continue to navigate the SERP,
it would stay at that higher number. Interestingly enough, after it jumps to higher
number, if you do a domain search (not site:...) that number would also jump
accordingly, say from 5,000 pages to 500,000 pages, and would also stay
there even if you navigate thru SERP or randomly select a different starting page.
I AM aware of the fact that SERP index jumps from page to page, and it can
jump dramatically, even by tens of time. But what I am seeing here is a
totally different behaviour. If it was jumping as it usually does, it would not
stay consistent at the higher number once you pass that magic 4th page
in serps, just like these show:
Results 31 - 40 of about 37,900 from mfcgoldmine.by.ru. (0.13 seconds)
Results 31 - 40 of about 73,400 from tarkus01.by.ru
Results 31 - 40 of about 33,200 from cppgoldmine.uuuq.com
Results 31 - 40 of about 25,400 from cppgoldmine.by.ru
etc.
As you can see, it is EXACTLY the same condition.
Now, interestingly enough, you never jump to that much higher number
if you simply keep redoing search by pushing Search button again, no
matter how many times.
And it is not exactly consistent and always reproducible by doing Next
on a SERP. If you are not lucky, and when you hit that magic 4th page
and index did not jump to that higher number, from then on, no matter
what you do, you won't be able to get it to jump to that number.
Thanx, I'll look at that site.
I was mostly dealing with the uuuq.com domain sites as a reference.
There is a lot of work to maintain all these sites and things may and do get out of sync.
By the way, that high number IS consistent. It does not just jump to a completely
different higher number, like if you navigated the SERP. It is ALWAYS the same
exact number for a given site, no matter if you navigate the SERP with next/prev
buttons or randomly chose some SERP page. This is not the same behaviour as
I usually see while doing next/prev.
Alos, I have verified quite a few pages by randomly jumping to different SEPR
and clicking on some link. They were all valid and existing pages from all sorts
of places in the world. Some Indian, Chinese, Japaneze, Arabic sites, and you
name it. Not sure if I saw a single page that did not exist.
That means that large number is correct. It is not just some kind of bug or
a totally off the wall estimate. Whenever I saw index jump dramatically while
navigating SERP via next/prev link, it ALWAYS jumped lower, never higher,
especially by tens and hundreds of times.
This is a different animal we are dealing with here, and this stuff started happening
relatively recently. I just noticed it within the last week accidentally. I was watching
the index steadily declining during the last couple of months, and pretty radically,
and was trying to figure out why would that be, untill, all of a sudden, I did a site:
search and it jumped hundreds times higher, and stayed there, no matter what I do.
Then, after a couple of hours, if I repeated the seach for the same site, it would
jump back to that lower number (with usual variation of +/- 10% or so, and there
is nothing you could do to make it jump to a higher number, untill I magically
stumbled upon it again and noticed that magic 4th page phenomenon.
And that is exactly what I am talking about here.
There are screenshots above. You can look at them. May be it will give you some idea.
Who knows?
Basically, the bootom line for me is this:
I have no problems with that higher number, because according to my estimates,
it does correspond to real state of affairs, considering G bot may find things I did
not even suspect. That is all fine with me and I am not questioning any of it.
But the isssue is: that higher number is not just an abberation. It does correspond
to what indexing eventually converged to, and the very existence of that much
lower number is an obvious suspect, regardless of all other criteria and factors.
Btw, if you need more sample data of any kind, such as sample history
with exact time stamps and all the exact numbers, just ask. We can get that
in a wink of an eye, including all other screenshots showing the correct
high numbers in SERPs.
1) Google is not obliged/required to crawl/index anyting
Not applicable to this case.
2) Google will typically only crawl/index a set % of a site anyway - the % may
vary based on things like Trust, Authority, Popularity, Internal Link structure,
server responses and time length for responses etc.
Not applicable to this case.
3) Crawling does Not = Indexing
Not applicable to this case.
4) Indexed does Not = shown in SERPs
Not applicable to this case.
5) Google /may/can/does Filter results in the SERPs ...
It may decide Not to show some URLs if it sees them as Duplciates
(full/partial - Internally/Externally and/or due to Canonical issues).
It may decide Not to show some URLs if it perceives them as
being "weak" (little/no content, liitle/no original content,
no intenral links, poor response times, poor response history etc.).
Not applicable to this case.
6) Results given may vary based on DataCenters - G's info is on multipel networks
- the DC you speak to may change based on the response speed of the DCs, your ISP,
your Browser, the time of day etc.
Not applicable to this case.
The variations between different DCs normally produce a totally different picture
in terms of sample deviation and stability of post settle period. (Post settle means
once you hit that higher number, it does not variate from there, while in the situations
you describe, it will indeed change and relatively wildly, as has been observed
before on numerous occasions.
7) The figures shown by G tend to be "estimates" or "guesses"
- as you click through the Pager Links at the bottom, the figures tend to change.
Not applicable to this case.
(It may say "of about 5000" on page one, go to page 25 and
it may say "of about 1200" and if you go to page 79 it may say "of about 800")
Not applicable to this case.
The highest possible number on SERPs, is the number you hit the first time
you push the Search button. From then on, no matter how you havigate the
SERPs, you can NEVER get the number higher than the initial number you
got when you hit the Search button.
8) You may find that Google is actually "consolidating" it's figures.
The figures you saw before could have been wild guesses - but now G has had time
to properly crawl the site, it has realised that it actually on has X pages,
and never had Y pages in the first place!
Not applicable to this case.
9) The only way o have a better idea (but still NOT likely to be 100% accurate!)
is to click through ALL the pager links!
Due to the site of the site, using a site:operator is likely to be ineffective -
isntead you should use the domain plus a Directory, or possibly even a SubDirectory...
site:yoursite.com/directory1/
site:yoursite.com/directory1/subdirectory1a/
etc.
Not applicable to this case.
But thanx for suggestion. I did these kinds of things also.
Top Contributor
Webmaster Help Bionic Poster
7/4/09
Hold on one minute.
PLEASE give us some URLs of these "searches".
A Google SERPs URL showing hte Low figure
A Google SERPs URL showing the High figure
a Google SERPs URL for the "domain search"
Do you think this answers the question?
Report abuse
One more time: I can get you all the screenshots for site:... and domain types of searches
for the higher numbers. For the lower numbers, you can get yourself using those URLs.
I have all the screenshot images saved in the files. Just tell me which ones you want to see.
Cause there are quite a few of those.
I'll make you a page where you can verify it 100%.
Top Contributor
Webmaster Help Bionic Poster
7/4/09
Screenshots do NOT explain WHY your site is so special that it wouldn't suffer ANY
of the typical Variances that occur in C/I/R and/or SERPs composition/display.
I want to KNOW why your site/URL is not applicable to such things.
Do you think this answers the question?
Report abuse
I am not talking about site being "special" or not.
It is a different issue, the issue of CONSISTENCY of SERPs behavior
under the same exact conditions, regardless of how "good" or "bad" do you think
that site MAY look to G bot.
As far as SERPs variations go, even from navigating by next/prev links,
I AM aware of those and it has been described above.
I am talking about THE SAME EXACT conditions within the context of
the same site. SERPs do not just wildly jump up and down. Again, the
results can only DECREASE from the initial SERP when you hit that
Search button. They can never jump higher from that number, which is
exactly what we are seeing here.
That Search criteria means anything that contains that string - whether it is on that domain or Not!
So if ANYONE talks about your site, and mentions your DomainName - then it may be included in those SERPs!
If I went and put "mfcgoldmineuuuq.com" on one of my sites...
in a few days, that result figure may move up to be
"of about 1,030,001"
instead of hte current
"of about 1,030,000"
See that?
It would possibly go up by 1 bec ause I would have put that bit of text -
the one that matches what you are searching for - on a page somewhere.
NAff all to do with your site, naff all to do with what Google has indexed from your site.
Is this making any sense at all????
Do you think this answers the question?
Report abuse
(Because that initial number is the ABSOLUTE BEST number
for that site as seen by particular DC, that happened to serve
that result at that particular instance.
No matter how you navigate the SERPs from then on, you can
not possibly jumpt to a number on the INITIAL SERP that is
tens or even hundreds number of times higher.)
"What I am seeing is this
http://img10.yfrog.com/img10/8379/mfcgoldmineuuuqcomgoogl.png
WHICH IS WRONG!!!!
That Search criteria means anything that contains that string - whether
it is on that domain or Not!"
- Not applicable to this case.
Because, whatever that number happens to be, it should always come out more or less
the same, within reasonable variation.
But it should NEVER jump from 4,000 to 3,000,000,
no matter what kind of search you perform.
This just not how things work.
"If I went and put "mfcgoldmineuuuq.com" on one of my sites...
in a few days, that result figure may move up to be
"of about 1,030,001"
instead of hte current
"of about 1,030,000"
"
Fine, no argument about that.
But...
That number should NEVER go down to 4,000 from 3,000,000
in a SINGLE session, no matter which DC you happened to have
hit, and no matter what stage of transaction (updating that exact
site on that exact DC at that exact time) happens to be.
It should be consistent within a reasonable range,
which is calculated as follows:
Crawl rate per day (or index update period, irrelevant)
as derived from your rule set divided by total number of pages indexed
on your site, given your particular ranking.
THAT is your expected variation of the outcome.
ALL other conditions are irrelevant and are not applicable.
Top Contributor
Webmaster Help Bionic Poster
7/4/09
But you are NOT going to the Same DC every time - are you!
There could be a bottle neck, there could be a recurring issue with your site,
there could be vcarious other issues.
But until you are testing it Proeprly, you do not know.
Go and dig out a Google DC IP.
Start doing your tests on that same DC IP.
Infact - go and pick 3 or 4.
Log the resutls fro mthose.
Then you may get more insight.
Do you think this answers the question?
Report abuse
Top Contributor
Webmaster Help Bionic Poster
7/4/09
Further more - your "equation" does not include any value, nor variance,
for "pushes" to various DCs, nor the possibilities of "roll-backs".
(The later being more than udnerstandable - as G isn't likely to tell us how often
that may happen, the extent it goes to, nor what sites were affected).
Do you think this answers the question?
Report abuse
Top Contributor
Webmaster Help Bionic Poster
7/4/09
BAsically - you are viewing it as "flat" or "linear" ... it is not.
It is matrixed and multi-dimensional.
There are Nuemrus DCs - that get updated at different times, holding data from different times.
Further - if there are any "SERPs compilation changes" due to thigns liek dupes/weak pages etc.
- it may not be applied to all DCs at the same time .... it may Roll-out and catchup .... or,
if your site is responding poorly on occassiona, it may result in "waves" through the
DCs as the responses fail/improve/fail/improve .... or those of other sites etc.
Do you think this answers the question?
Report abuse
"But you are NOT going to the Same DC every time - are you!"
ALL DCs are updated within the same 24 hr. period, more or less,
without knowing the exact scheduling of events.
That means, that ALL of them, at some point during the 24 hr. period
would come to the same number for your particular site, which is
exactly what I have observed upto now. The updates of one of your
domains may create a variation between different DCs DURING THE
TRANSACTION TIME ONLY, which lasts for a given site no longer
than a couple of hours, considering it is done in even smaller chunks.
So, after a couple of hours, ALL DCs stabilize as to your particular site
and all of them should show the same exact number, which is what we
are seeing here all the time.
REGARDLESS of bottlenecks and granularity of transaction.
Meaning, by "grand" transaction as to your particular site, is COMPLETE
update to bring that particular DC to full correspondence with the results
of the latest update.
That "grand" transaction is not necessarily performed all at once,
but is broken into smaller transactions.
After "grand" transaction is finished, ALL DCs contain the same data.
So, whenever you see the variations in your FIRST SERPs page,
that means a "grand" transaction is in progress for that particular domain.
During that time, you WILL experience MAXIMAL variations, that, at the
same time, can not possibly exceed the max. crawling rate given a
particular set of parameters for your ranking, etc.
Top Contributor
Webmaster Help Bionic Poster
7/4/09
Of course - we are also discouting the possiblility of REsource Preservation
- maybe G introduces cut-offs on the allocation of resources for htestimates etc.
maybe it's only done on certain DCs.
maybe it's only done at certain times of day, or on X number of requests.
Understood. I am not viewing this as some kind of flat, linear, single dimensional event.
Trust me.
My own filters are multi-dimensional also. :--}
And I have a pretty good idea what is exactly involved in a multi-DC updates,
except it is too technical for this level. We are already way too deep into technical details.
But again, the bottom line is this: It can NEVER jump from 3,000,000 down
to 5,000, no matter what.
Top Contributor
Webmaster Help Bionic Poster
7/4/09
You are clearly wrong - as it is happening.
That suggests that there is either a flaw in the reasoning, a flaw in the equation,
or a flaw in the information you are using to base this all on.
And - before you argue that point - think on it...
IF the SERPs are showing these varied results - then it IS happening.
Irrefutable proof - correct? Supplied by yourself - correct?
Therefore - it is possible.
Do you think this answers the question?
Report abuse
There ARE strategies to assure you are not going to have such drastic deviations.
Basically, either transaction completes,
or it does not.
If it does not, that means your system is broken and is in inconsistent state.
And if THAT is the case, NONE of your existing data could be trusted.
You would have to unwind the transaction and get to the state you were
before it started, and then reschedule it and do it again at a later time,
no matter what kind of resource bottleneck is there.
OR,
Transaction does complete, no matter how long does it take depending
on the traffic jams, resource starvation or all sorts of other things.
It all translates in your propagation delay (meaning, the length of time
it takes to complete a "grand" transaction for that particular time).
ALL other cases indicate a fundamentally broken system,
which can never come to a state of consistency.
Meaning, ALL RED ALARMS ARE ON,
and I am not going to tell you which is the MAIN one.
So, let us keep it in the same scope of reality.
The program on your box either works, or it does not,
at least as far as major functionality is concerned.
When you write to a file, you reasonably expect ALL the data to be written,
regardless of how small is your memory and how many threads you are
running at the moment. It ALL translates into prop delay, or a TOTAL deadlock,
such as you ran out of disk space and there is nowhere to write.
You rean out of memory and ran out of swap. So no more memory could be
allocated or swapped. You are in a deadlock state.
In a deadlock state G stops responding to ALL sorts of things.
Do you comprehend what we are talking about?
Top Contributor
Webmaster Help Bionic Poster
7/4/09
Yes.
And you appear to making assumptions on how Google handles the psuh/update process.
Is it fed from a single origin to all others
Is it fed from DC to DC in a predefined sequence
Is it fed from DC to DC in a varied sequence based onload/resource
Is it fed enmasse every time, to each DC
Is it fed enamasse some of the time, to each DC
etc. etc. etc. etc. etc. etc. etc.
Then you have to examine how they handle Flagging of issues....
if resources are bottoming, does it slow up or cease.
If it ceases, does it quit and have to restart, or does it quit and resume.
Does ceases affect others in the chain?
There are LOTS of potentials there.
Do you KNOW how G does it - or are you assuming?
Do you KNOW how G handles any encoutnered issues - or are you assuming?
And another spanner in the works - go back and look at your calculation regarding
how much ins indexed and what the range should be.
Where in there does it factor the potential for removal of previously listed data?
You have 10 pages indexed.
You get betwen 2 and 5 indexed per day.
That gives you, within the next 24 Hours, 10-15.
Then factor in that G may have decided that 6 of the original pages were junk.
That means it may total 4-9 pages - thats less than you started with!
Then add in that you may be talking to a different DC - on that is a little "out
of it" ... that may still show from the Last update period...
so that means you may have 5 to 8 pages indexed.
Now what happens if the "junk" data is sent ahead of the Recently Indexed data?
you may have even fewer pages listed... (technically it oculd be at -1 for a short period, yes?)
.
Is that making sense?
And I'm not convinced about the "24 Hours" either.
I've seen some sites take more than 3 weeks to get a change appear in the SERPs.
It dependso nhow "important" the site is in G's opinions.
So there is yet another factor to include as it is missing.... what happens
if G decides that your site is not as important all of a sudden...
that means it may not only take longer for the update ....
but you may get fewer pages crawled ... and possibly even less in teh index from that crawl!
Do you think this answers the question?
Report abuse
Top Contributor
Webmaster Help Bionic Poster
7/4/09
JohnMu
(Google Employee)
+
3
other people
say this answers the question:
AGain - you are assumptive.
How do you know which ones are corect.
Particualrly in light of the fact you are doign a search for a "term" and
not based on a site.
site:whatever.com
will yield data for a speicifc Domain
whatever.com
will yield data from multiple domains.
That means you may be dealign with a damn site more variance.
.
I am NOT stating you are wrong.
What I am stating is I think you are not viewing it correctly,
not taking into account numerous other potentials, and are generating skewed results.
(that doesn't make you wrong - that jsut means the examples provided are next to uselesss
Do you think this answers the question?
Report abuse
Top Contributor
Webmaster Help Bionic Poster
7/4/09
Now personally - I've had enough.
I've given multiple examples of your equation being non-exhaustive and lacking
in additional facotrs.
I've explained some of those other factors.
I've pointed out that some of your data is non-relational.
You are simply going to sit there andrefute it.
Do so at your leisure.
I personally cannot be bothered to waste any more time on someone who isn't
going to realise the potential flaws ... and I doubt if anyone else will either.
So please - do sit here and continue to be assumptinve.
If you are lucky, someone from G may pop in.
.
Best of luck in getting an answer though.
Do you think this answers the question?
Report abuse
"And you appear to making assumptions on how Google handles the psuh/update process.
Is it fed from a single origin to all others
Is it fed from DC to DC in a predefined sequence
Is it fed from DC to DC in a varied sequence based onload/resource
Is it fed enmasse every time, to each DC
Is it fed enamasse some of the time, to each DC
etc. etc. etc. etc. etc. etc. etc."
Correct. All these things were considered.
"You have 10 pages indexed.
You get betwen 2 and 5 indexed per day.
That gives you, within the next 24 Hours, 10-15.
Then factor in that G may have decided that 6 of the original pages were junk.
That means it may total 4-9 pages - thats less than you started with!"
There are rules in transaction processing, which force you to introduce
only predictable amount of new data to the system so the whole system
still remains CONSISTENT. The principle of consistency is the MOST
critical principle.
So, at EACH stage of the game, no matter what kind of multi-dimensional
anything you can even begin to conceive, your system MUST remain
CONSISTENT. If, at ANY given point, it is inconsistent, it means DEATH
to the whole system.
That is why databases and SQL work.
Otherwise, all the computers and all the business on the planet Earth would stop.
In multi-dimensional systems, you always have a delta (a smallest part of transaction).
So, you introduce YOUR delta into some DC's context.
Once that DC updates itself with YOUR delta,
it then provides YOU with ITS delta.
At this point you are both consistent with each other.
So, yes, the issues are highly complex. Because there are not just 2 DCs,
but many. But at any given junction, there is either reconciliation between
one to one DCs, or ALL delta DCs against the MASTER DC, or reference DC.
So, transactions may be conducted via tripple way exchanges, in smaller
stages. But this is WAY beyond the scope of this thread.
Tell me, is there ANY way some developer from appropriate department
may get to look at this data?
Is there a develper's forum or something of a kind?
Finally, both lower and higher numbers ARE consistent,
which indicates we are not getting some random data in the middle
of transaction. Otherwise, we would be getting the random lower number.
But that lower number remains the same, within normal variation typical
of updating any single DC.
Top Contributor
Webmaster Help Bionic Poster
7/4/09
>>>
Ok, I see what is going on. It is a bug.
So what? In which way does this "bug" affect you, or me, or anybody else?
In all those threads you opened for this crap I've not once seen any kind
of explanation why it is you freak out over a couple of contradicting numbers?
Why don't you stick to monitor and increase traffic instead of making a
fuzz over something you don't have and never will have any control over?
Anyway, I find your examples quite questionable, and don't really understand
what you want to say. Whenever you do this search:
http://www.google.com/search?q=mfcgoldmine.uuuq.com
(3.800 for the complete string or 1.030.000 for the string and parts of it,
your site does by NO means have a million references *LOL* you better forget that
erroneus newbee asumption right away *LOL*)
you're not querying for your domain or something, you're searching
for the string "mfcgoldmine[dot]uuuq[dot]com" inside the content of
your and other sites, whereas searching:
Top Contributor
Webmaster Help Bionic Poster
7/4/09
pgelqued,
there is no point discussing about numbers larger than 1000 returned by search info, because
they are an estimate,
you cannot go in search results beyond 1000.
Also the total number can be affected by
duplicate content
found by Googlebot at the time you do the search.
Autocrat is right,
specify a URL that is not indexed in search results, and that you expect to be indexed.
And
please
make your posts shorter, because with the way you write your posts
you make it very difficult to people to follow what are the problems you are raising.
Do you think this answers the question?
Report abuse
Top Contributor
Webmaster Help Bionic Poster
7/4/09
1
person
says this answers the question:
Seriously peole - it's Not worth it.
I spent ages last night pointing out hte various holes/flaws/problems with the whole thing,
the OP isn't interested in being shown they are wrong.
They want to be told they are right ... or that Google is wrong.
I suggest just boycotting this one and letting them get on with it.
(And thanks JM ;) )
Do you think this answers the question?
Report abuse
First of all, dear luzie, Autocrat, cristina and others, who get so angry,
I can just hear stomping of their feet and grinding of their teeth, let me ask you this:
Why are you SO upset about this, like Google's LIFE depended on it?
If you are so convinced in your totally erroneous and utterly inapplicable conclusions,
and I mean EVERY SINGLE ONE OF THEM, then just move on, and do something
creative for once in your lives, instead of engaging in insult, ridicule and outright
harassment.
What is at stake here for you?
Why does it bother you so much?
Why do you need to attack someone with totally erroneous and insulting conclusions
that do not correspond to reality and hard facts and hard data as presented?
And ALL of you are "Top Contributors".
And ALL of you know the rules of conduct on these forums.
Does THIS kind of behavior create "positive user experience for everyone"?
I, thereby, make an official request to authorised Google personnel to take
an appropriate action to stop this uncalled for behavior. If these people behave
as they do, and this is not an asolated case, by ANY means, and there is
plenty of evidence on record, then the very motives of their participation on
these forums are suspect.
Furthermore, I would like to mention this:
Several threads related to specific issues related to this thread were removed.
Why?
WHO did a removal?
For what exact purpose?
Ok, let us look at one specific argument from your side:
luzie,
"http://www.google.com/search?q=mfcgoldmine.uuuq.com
(3.800 for the complete string or 1.030.000 for the string and parts of it,
your site does by NO means have a million references *LOL*
you better forget that erroneus newbee asumption right away *LOL*)"
Incorrect and TOTALLY irrelevant to the exact issue on the table.
The problem is that the search result can not possibly fluctuate as bad
as we see here.
First of all, even if search engine breaks the mfcgoldmine.uuuq.com string
on 3 different tokens, still mfcgolmine, and especially cppgoldmine are
so unique that statistical chance that your search result will indeed be referring
to one of sites in question is probably well over 90%. But, in order to make a
definite conclusion, a detailed study needs to be conducted.
Secondly, there is no stemming involved here, if any of you know what it means.
If you don't, do a google search on search+engine+stemming.
But fine. Let use eliminate that component by performing a
LITERAL string search, that is "mfcgoldmine.uuuq.com".
The CORRECT result for search on "mfcgoldmine.uuuq.com"
Results 1 - 10 of about 1,030,000 for "mfcgoldmine.uuuq.com". (0.32 seconds)
Still, EXACTLY the same as in original case. Furthermore, it is TOTALLY
consisted with ALL previous samples going back several days, which tells you
what?
For incorrect result, that is seen in over 98% of cases, you can perform a
search yourself.
The INCORRECT result for exact same search under exact same conditions
by simply pushing the Search button again is:
Results 1 - 10 of about 3,700 for "mfcgoldmine.uuuq.com". (0.05 seconds)
This is TOTALLY inconsistent. A properly functioning system
can not possibly produce these absolutely astounding variations
of over 60 times for the exact same search under exact same conditions.
Search on "javagoldmine.uuuq.com"
Results 1 - 10 of about 460,000 for "javagoldmine.uuuq.com"
Clicking on Search button again produces this result.
Results 1 - 10 of about 7,200 for "javagoldmine.uuuq.com"
Conclusion: THESE RESULTS ARE TOTALLY INCONSISTENT
AND INDICATE THAT EITHER SEARCH ENGINE IS BROKEN
OR RESULTS ARE ARTIFICIALLY MANIPULATED.
ALL of 1,030,000 for "mfcgoldmine.uuuq.com", within the reasonable
range, considering that this is an estimate, that could conceivably
vary within +/- 10%, DO EXIST in the index.
Therefore, the lower numbers are totally incorrect.
The consequences of such a behaviour on ANY kind of search are hard
to even begin to estimate without conducting a detailed study.
Correction: the search result for "mfcgoldmine.uuuq.com" is not 60 times off,
but 278 times!
And this kind of difference can never happen in a properly functioning system
under ANY conditions conceivable.
All other snapshots of correct results are available and will be presented
in due time.
WARNING: If this thread is removed again, we are going to be dealing
with totally different set of issues and results may not necessarily be
the ones you are trying to achieve.
Thank you for admitting you are the one who removed these threads.
As far as your exact statement:
"But it was me, because valuable volunteer time was being taken up to no useful purpose"
I can tell you this: NOBODY IS FORCING YOUR VALUABLE VOLUNTEERS
TO PARTICIPATE ON THIS THREAD.
Especially condidering the fact that NONE of them have sufficient competence
in the issues of search engine internals. Otherwise, they would be working as
DEVELOPERS at Google making cool couple of hundred thousands of dollars a year.
So, there is no need to even bother.
As I repeatedly stated already: I am only interested in COMPETENT opinion
by someone who can even begin to understand these kinds of discrepancies,
and that is either top level developers at Google or architects that know the
exact mechanics of search engine internals.
I would also like to mention the fact that it seems totally inappropriate
to allow non Google authorised personnel to be able to remove threads,
especially in the context of conflict of interest issues and the very fact
that the issues, addressed on these threads lay WAY outside of their
level of competence.
Top Contributor
Webmaster Help Bionic Poster
7/5/09
2
people
say this answers the question:
> I can tell you this: NOBODY IS FORCING YOUR VALUABLE VOLUNTEERS TO PARTICIPATE ON THIS THREAD.
No and I don't need to recommend they stop, because they all seem to have taken that decision on their own.
As regards getting Google's attention - this week you pretty well won't. The ones who normally monitor this forum are either out of their offices doing other things or on vacation - monitoring by Google is pretty much "on demand" only.
Even the ones that do monitor the forum regularly when they're here have pretty much given up trying to talk to you. Nothing is broken, nothing needs changing, and even if it did you are only one of tens of millions of webmasters - why break something for ten million to please one? No one else - of all the millions - is having this problem. Believe me - Google is NOT goig to change ANYTHING to please you.
It comes back to the fact that you have a spectactularly poor, mostly copied and garbled set of interlinked sites that are of very little use to anyone.
Do you think this answers the question?
Report abuse
"Believe me - Google is NOT goig to change ANYTHING to please you"
It is not a matter of pleasing anyone. These results refer not to a single isulated case, but to several cases, and there are reasons to believe such discrepancies would cause similar discrepancies in other search engine queries regardless of site.
As to your opinion regarding the quality of those sites and the information they contain, it is simply inappropriate and not conductive to "positive user experience for everyone".
Furthermore, it is totally irrelevant as far as exact set of issues discussed are concerned.
As I said before: I did not come here asking for the ways to improve either ratings or performance. This is a non issue at the moment.
Furthermore, your very first statement on the original thread you admitted to have removed (Question: Google index drops like a rock), is this:
"Top Contributor Webmaster Help Bionic Poster 6/30/09 It's possible "selected from very large archives" is a euphemism for "stolen from the Library of Congress".
I, personally, find these kinds of questions HIGHLY offensive, and DEMAND an explanation, that can possibly justify this kind of position.
Do you have ANY evidence or facts or reasons to believe that this collection was either "stolen from the Library of Congress", or a subject of ANY kind of copyright related issues?
Furthermore, your following statements on the original thread and on this one has been successfully refuted.
"We don't need to prove it. Simple numeracy tells us this is an unusual phenomenon. There are around sixty million indexable domains on the planet. If 1% hist such problems we'd have 600,000 aggrieved webmasters posting here and in every newspaper that's printed.
The simple fact is - you're all on your own. One in sixty million is fifty times less likely than winning the UK's national lottery."
You can not possibly prove such an assumption and it is TOTALLY invalid to begin with as has been described in original thread, which YOU, personally, have removed, and they are, and I quote:
"Not necessarily. First of all, they were all told and MANY times over, to the point of being zobified, that Google does NOT guarantee anything and their index MAY and DOES fluctuate, no matter what they thing is reasonable.
Plus, how many webmasters do you think are willing to rock the boat and tell the business owner he is loosing millions because of those wild fluctuations? Why would anyone in his clear mind do that? You see, it is much profitable just to keep quiet and pretend you did not see any of it, cause there is nothing you can do anyway, instead of rocking the boat and possibly loosing his job, if boss learns that he lost millions "because of this clueless bozo, who calls himeslf SEO".
Get the picture?
Finally, how many webmasters even participate on these forums? According to the way you do YOUR statistics, it is less than 0.00000000000000000000000000000000001 % of all the webmasters in that huge ocean called Internet."
[...]
"How many web masters even KNOW they are having these wild fluctuations?
How many of them keep the constant watch of their google index?
How many of them take regular snapshots of their statistics, at least as to the number of total pages indexed by Google from your sites?
Does google provide the charts of total number of pages indexed by google from your site even on Google Analytics?
How many webmasters do you think can produce a running report of their total pages indexed and post it here?
Would you like me to produce one for you and see if YOUR is as good as mine?"
And you had no argument on it whatsoever. So...
This is one more chance for you to prove your point in order to restore your tainted reputation.
"It comes back to the fact that you have a spectactularly poor, mostly copied and garbled set of interlinked sites that are of very little use to anyone."
It is simply outrageous. Simple as that, and the EXACT information has been provided to you on threads you have removed.
These sites happen to be the REFERENCE sites for a number of Universities and other educational institutions.
These sites are REGULARLY visited by the biggest software houses in the world, such as Microsoft, Sun, Intel, HP and other biggest names in the software, hardware, business, banking and finance, leading world manufacturing corporations, goverments and even military.
These are probably the cleanest sites on the net as far as producing PURE content without a single ad and on cleanest pages, that do not have ANY kind of visual garbage, whose sole purpose is to milk their sites for ad revenue, which is exactly why MOST of the sites on the net contain very little information on a page as a ratio of useful information to total size of a page.
Some of the "top ranking" sites contain less than 10% of on-topic, useful information on every single page. It goes as far as having 2-3 sentences reating to the issue and Topic, and pages worth of all sorts of advertising spam.
There is PLENTY: of reasons to believe that the top ranking sites, at least as far as issues covered by Goldmine collections go, are in fact the biggest spamming sites there are.
Vast majority of information they provide is nothing more than a marketing spam.
This particular issue is one of the central issues of Goldmine collection organization. The article pages contain NOTHING but exact information, extracted with the most sophisticated filtering technology that exists at this junction, and are GUARANTEED to correspond to a chapter Title with WELL over 90% certainty.
The amount of most useful information and relationship between useful, hard to find, competent, etc. information and its ratio to the total number of articles on a given chapter's topic, is probably the highest you can find ANY place on the net in the context of similar information.
The amount of practical code examples and snippets on subjects covered by the collections is simply unprecedented, which allows one to find the answer on ANY conceivable issue or most difficult problem one might have, is simply unheard of.
The VARIETY of examples, views, expert opinions on ANY given subject or topic reflects the best of the best, the state of the art and is probably the most valuable collection of similar information existing on the planet Earth at this particular junction.
Because...
Well, because of a simple fact: "If we don't have it, it probably does not exist".
Add to it: and if you find a more precise collection of similar information ANYWHERE on the net, that includes this kind of coverage, depth and precision, including, but not limited to Google's own collection, considering the ratio of valuable/total information, Microsoft, Sun, Intel, IBM, or you name it, I would be curious to see your references.
VAST majority of similar collections are simply a garbage dump, that contains every single article regardless of its appropriateness to a given subject or topic.
The chances of you finding truly useful information in that garbage dump are less than 1% in most cases, if not much worse than that.
So, I find these kinds of remarks by TOTALLY incompetent individuals, such as all those, foaming at their mouth and throwing around all sorts of mud, insults, harassments, humiliation and ridicule, totally off base, totally ungrounded and totally incompetent.
All articles in collection arvives are guaranteed to be unique with 100% certainty.
All articles in any chapter are guaranteed to be unique with 100% certainty.
SOME articles may appear in more than one chapter if the issues covered by that particular article are DIRECTLY applicable to a different chapter with well over 90% certainty, which is unprecedented for similar collections on the net.
ALL article pages are validated under the strictest HTML standards possible, and that is HTML 4.01 Strict.
Yes, as a result of recent changes one or two articles out of average 50,000 articles in these collections do indeed have validation errors, that are, nevertheless, do not affect the page rendering or the ability of Google bot to index these collections to FULL extent. There is a guaranteed path from the top level index page to every single article in collection, regarless of what kind of browser is used and wether CSS is enabled.
There is no link stuffing, hidden text meant to be exploited in order to artificially cause the page rank to go higher, or any other tricks used to artificially inflate the ratings.
As far as "farms" go, the argument of the opposing site is totally invalid and has been explained in the articles removed.
Each of these collections has at least one mirror site, which is 100% duplicate of the original site. It has exact same index pages, exact same articles and exact same everything.
There is no benefit to having a mirror site in terms of any kinds of rating. No matter how many mirrors are there, there is only one article that can be viewed, and it does not matter from which mirror. The page view count is not going to go higher just because a particular page was accessed from a different mirror.
Mirrors are used extensively on the net to increase reliability and decrease the traffic load.
Web has an inherent problem related to single point of failure. If page is delivered on a single site, then any kinds of attacks on that site can cause the whole information library to be off line.
Since these collections are used by professional programmers 24 hrs./day, and those programmers have the toughest issues to resolve in the shortest possible time frame, it is IMPERATIVE these collections be protected by reliable mirrors. Our mirrors are some of the most reliable mirrors on the net and page load time is one of the best in the industry for some very specific reasons, that INHERENTLY make these mirrors the most reliable mirrors possible, because they do not allow ANY kind of executable content on these sites, including, but not limited to PHP, any kinds of scripts, shell access, etc. The ONLY thing allowed on these mirrors is the simpliest non executable ssi statements, and THAT is one of the reasons these are some of the most reliable sites on the net.
Furthermore, Google bot can EASILY discover that these mirrors are in fact mirrors and not just some tricks to inflate the ratings.
http://jsgoldmine.uuuq.com (this one does not even have mirror)
On the top of it, ALL of these collections clearly belong to the same Google webmaster account. There are no tricks used to hide ANYTYING.
So, Google has MULTIPLE ways and means to distinguish the essence of "interlinking" as far as any conceivable aspect goes.
Finally, if Google decides to penalize the valid mirrors, the net effect on the Internet will be disastrous.
First of all, it will drastically reduce the availability of most collections of information and all sorts of distribution channels. Some of the most valuable resources on the net will simply become inaccessible as they will be dropped from Google index, which, by now, is the biggest resource on the net and is recognized as a #1 choice for all searches on the net.
Top Contributor
Webmaster Help Bionic Poster
7/5/09
2
people
say this answers the question:
Demand all you like sunshine.
You've been told your logic is faulty. You've been told that you are not including all the relevant factors. You've been told that your method of examining references is incorrect.
If you are too damn stupid to accept all of that - from multiple people ... and to jest get on with it- that's down to you.
Do NOT expect anyone else to make the effort to aid you. do NOT expect anyone else to bother giving you any attention.
if I see you making multiple Topics about the same STUFFING THING - I'll * well delete/report!
Am I Clear?
(And I'm not kidding - I'm sick to damn death of your whinging and refusing to acknowledge your wrong - but that is your choiuce. What I don't have to put up with is you trashing this forum/community, nor upsetting other posters or regulars whilst you're being stubborn!)
Do you think this answers the question?
Report abuse
I find these kinds of remarks totally off base and not in line with "positive user experience" issues and guidelines.
As for the "link stuffing", giving the exact information in the argument related to site interlinking, so everyone could see exactly what we are talking about, it simply looks strange, especially considering the fact that these exact URLs already appear in the same thread, only in a different context.
Finally, again, there is no need to get upset to the point of blowing up. If you, personally, do not find this thread of interest to you, there is no need to interfere or even bother about it.
The issues ARE valid and specific evidence was provided, and, as been stated before, not a single opposing argument so far does correspond to the exact issue being discussed in this thread.
Just relax. Why be so worked up about it, especially if it has nothing to do with your problems?
Or do you have some kind of vested interest in this information being suppressed?
You reactions seem to be too strange, too overboard and too uncalled for.
Does anyone bother you? Does anyone asks for YOUR particular opinion?
You presented your position and it has been reviewed and evaluated to full extent.
Nice picture. I think some of the top programmers in the world that use the Programmer's Goldmine collections would find it pretty entertaining and representative of the "pleasant user experience for everyone" slogan.
Since YOU have posted this picture, you probably know what is the meaning of it.
I would just like to ask a question:
Why did you put up a picture of young African amerian guy, whose hands could be tied behind his back, and who has a foot ball stuffed into his mouth?
Is it the message to all people? Is this in your opinion on what Google thinks about its customers and users?
Interestingly enough a couple of "Top Contributors" even clicked on "Yes" button, next to "Do you think this answers the question?"
You guys seem to have plenty of sense of humour.
Does it make you feel better about yourself? Like you are some kind of "elite", who had the authority not to only insult, ridicule, humiliate and harass the people as you do here all the time it seems, but, for some strange reason, is even allowed to delete the other people's posts!
An unusually broad authority given to you by Google, I'd say. I am just curious, how does Google select its "Top Contributors" and what are their authorities and relationship to Google in general?
Are some of you PAID for your noble efforts to help those "clueless" webmasters all day long?
Top Contributor
Webmaster Help Bionic Poster
7/5/09
"... The issues ARE valid and specific evidence was provided, and, as been stated before, not a single opposing argument so far does correspond to the exact issue being discussed in this thread. ..."
Hello ? This is yopu early wake up call from REALITY!
As stated (now several times) your methodology is FLAWED Scroll up Look at hte response ticked as Best Answer. Look at who+what ticked it.
A Google Employee has visited. A GE has made a judgement.
Take the * hint!
Do you think this answers the question?
Report abuse
Autocrat, Becky Sharpe, webado, Phil Payne, cristina, luzie, Kevin-UK think this is the "Best answer".
Basically ALL the "Top Contributors" think the same way.
Well, I guess the majority opinion DOES define Truth. If everyone thinks the Earth is still flat, than it MUST be!
If everyone thinks that the planet Earth IS the centre of the Universe, than it MUST be.
Otherwise, they would not burn some guys for making such proclamations.
But...
There is a very little but:
You see, properly working systems I know of, and I know plenty about that stuff, do not work like this. Otherwise... Everything is nice and kosher indeed.
But, it's been a pleasure to hear some enlightening views as to Google internals, the distributed nature of DC (Datacenters), the procedures and principles of transaction updates, roll-backs and all sorts of other useful things, including the difference between the mfcgoldmine.uuuq.com and "mfcgoldmine.uuuq.com" searches, that, for some strange reason, showed exactly the same result nevertheless.
Top Contributor
Webmaster Help Bionic Poster
7/6/09
1
person
says this answers the question:
You can make out that you know what you are talking about as much as you like. But anyone reading this (sympathies to them) who see's the Searches you were making will KNOW you are clueless. Anyone examining your method of calculation will also spot the numerous screwups, miscalculations, lacking factors etc.
In short - no one is goign to think you are smart nor knowledgable on this.
Too many errors, to many assumptions. I'm not a data engineer - and I can see how far fetched your approach is.
I honestly pray a Google Engineer pops in on this.
Just so you can shut your * cake-hole :D
Do you think this answers the question?
Report abuse
Results 1 - 10 of about 5,850 from mfcgoldmine.uuuq.com for threads
Search:
site:mfcgoldmine.uuuq.com
Results 1 - 10 of about 1,250 from mfcgoldmine.uuuq.com
I wonder if anyone can explain THIS one. From what mere mortals would probably conclude, is that there are more articles in a single chapter as SERPs show, than in entire site.
Which one is right in this case? Are they BOTH right? One is right and the other one is wrong? BOTH wrong? None of the above? And ALL of above included?
"And now, the Truth has been spoken" -- Sankaracharya, India 5000 B.C.
Now, WHO said "They probably NEVER existed" or something of a kind?
Lemme see here...
Oh, I see, familiar faces, Top Contributors, those same people, that for some strange reason delete a perfectly valid thread after all "Top Contributors" had enough fun insulting, humiliating, harrassing and abusing.
Interestingly enough, the Google employee came and told me something along the lines: "be nice, these are nice forums, and all nice user experience should not be interrupted". Sure, this is not a literal quote, but we can dig that one up easily.
Anyway, here we go, and I quote:
Phil Payne Top Contributor Webmaster Help Bionic Poster 7/2/09
"> But can you tell me the possible reason for such a rapid blips in index (from 1410 to 59,600 and back to 1060, all in one day?
Yep, I know what you see in 98% of cases from my statistical estimates.
Now, can you add those few chapters above from that single site and see how many articles we already have, and we are about 10% into the site, as far as site size is concerned.
Impressive, I tell you.
Now, anybody is interested in seeing even MORE exciting numerology here?
And this one is hard to classify as anything but unbelievable. Why?
Well, do you know how many articles are there in Debugging Experts chapter?
Well, Articles: 1872, just in the latest run, and there are probably 3000 arcticles there as old versions that Google did not remove to this day even though the sitemap has been removed and the sitemap file was removed also.
They even went as far, as making claims that I am "spamming" this forum.
Who, ME?
And what about YOU? Don't you have a vested interest in this whole thing with your constant advertisements about your great services in your favicons?
This looks like about the best place under the Sun to collect some clients, while doing all this "volunteer work", doesn't it?
Because it really looks strange that all these high power SEOs are hanging out here, for free, as some of them state, trying to help the biggest name in the information business. You mean Google can not afford to PAY the COMPETENT personnel to take care of its own beloved customers, even being as big as Google is?
And you mean ALL these people are here to help ANYONE, but themselves? Doing all this work for free?
So far, I have not seen a single one of them, who genuinely interested in helping ANYONE. This seems like a feeding orgy to them.
On the top of it, they, for some strange reason, are given the power even to DELETE any articles they like, regardless...
One more time for all the "experts", "Top Contributors" here.
The correct and incorrect Google search results are:
Search on site:mfcgoldmine.uuuq.com
61,800 from mfcgoldmine.uuuq.com - correct result 1,160 from mfcgoldmine.uuuq.com - incorrect result (off by whooping 40 times)
Search on mfcgoldmine.uuuq.com domain, without site:...
Results 21 - 30 of about 1,030,000 for mfcgoldmine.uuuq.com - correct result Results 1 - 10 of about 3,360 for mfcgoldmine.uuuq.com - incorrect result (off by whooping 250 times)
Search on site:mfcgoldmine.by.ru
37,900 from mfcgoldmine.by.ru - correct result 1,490 from mfcgoldmine.by.ru - incorrect result (off by 20 times)
Search on mfcgoldmine.by.ru domain, without site:...
Results 1 - 10 of about 826,000 for mfcgoldmine.by.ru - correct result Results 1 - 10 of about 4,460 for mfcgoldmine.by.ru - incorrect result (off by 200 times)
Search on site:cppgoldmine.uuuq.com
32,800 from cppgoldmine.uuuq.com - almost correct result, but lower than it should be 2,880 from cppgoldmine.uuuq.com - incorrect result (off by 10 times)
Search on cppgoldmine.uuuq.com domain, without specifying site:...
Results 1 - 10 of about 543,000 for cppgoldmine.uuuq.com - correct result Results 1 - 10 of about 4,320 for cppgoldmine.uuuq.com - incorrect result (off by > 100 times)
Search on site:cppgoldmine.by.ru
25,400 from cppgoldmine.by.ru - almost correct result, but lower than it should be 315 from cppgoldmine.by.ru - incorrect result (off by almost 100 times)
Search on cppgoldmine.by.ru domain, without specifying site:...
Results 1 - 10 of about 513,000 for cppgoldmine.by.ru - correct result 3,060 for cppgoldmine.by.ru - incorrect result (off by 150 times)
Search on site:javagoldmine.uuuq.com
10,400 from javagoldmine.uuuq.com - incorrect, but closer to reality (which should be > 40,000) 3,190 from javagoldmine.uuuq.com - incorrect result
Search on javagoldmine.uuuq.com domain, without specifying site:...
Results 1 - 10 of about 459,000 for javagoldmine.uuuq.com - correct result Results 1 - 10 of about 6,200 for javagoldmine.uuuq.com - incorrect result (off by > 70 times)
Search one site:tarkus01.by.ru
71,200 from tarkus01.by.ru - correct result 15,400 from tarkus01.by.ru - incorrect result, but better than other incorrect results
Search on tarkus01.by.ru domain, without specifying site:...
Results 1 - 10 of about 471,000 for javagoldmine.by.ru - correct result Results 1 - 10 of about 13,600 for tarkus01.by.ru - incorrect result
Some quotes from thread "Question: Google index drops like a rock" that was deleted by Phil Payne, as he himself has admitted on this very thread:
"because valuable volunteer time was being taken up to no useful purpose. And my fingers are itching again."
Are you SURE about this?
Quote:
Ashley Level 4 6/30/09
"So, you can stomp your feet and say you're brilliant (you probably are, I'm not debating this), but if you want to show up in Google you have to play by Google's rules"
I see...
Quote:
Ashley Level 4 7/1/09
2 people say this answers the question: (I know. Heard that before, and not once :--})
"Sorry about that...
"I just got off the phone with Dr. Google and your sites will be Number 1 tomorrow. All tied for Number 1. And no, I'm not cheating. It's for REAL!"
Ashley, are you cheating again?
How many days have passed and Dr. Google doesn't even move a finger... Cause all those SERPs counts are TOTALLY off the wall, you know? It is kinda upsetting... I am a technical guy and I can not rest until things are humming like Bentley. Actually, Rolls Royce Silver Spour has very nice seats.
Plus, what kind of offer is this? Are you trying to bribe me? :--}
Quote by the next poster in the same thread:
KevinKatovic 7/1/09
"maybe we should short bio tech and google tommoro if it doesn't happen ashley."
Well, how would I know?
Don't ask ME on my thread. I am not running it. That other "expert" (at you know what by now) is running it. Ask HIM, Well, Ashley will do just as fine. Plus we are friends with her by now, with all these nice and "positive user experiences for everyone". It has been a PURE pleasure.
If Ashley drops me a line to the address on this page...
Well, may be we should not introduce ourselves so fast. Who knows... We still have bout 9 hours till I loose interest in this thread completely.
Oh, I just thought of something. Lemme see here...
Jeez, I don't believe this. Google stock down gap opening on July 2? Closed down 10.50 - 2.51%
Wow! That bites. That's how much?
Bout $2 billion bux in one trading session? Sure Google could afford to hire competent pros and even send them to search engine internals too if this kind of money are at stake.
Ashley, are you sure Mr. Google is in town and you did indeed talk to him on the phone on july 1?
I have to talk to my astrologer. Cause July 1 comes right before July 2 I thought, and then BOOM!
Could someone please essplain this to me, cause I must be even dumber than that Autocrat says. Just look at his favicon. Do you see what kinda guy he is? PURE authority, and you better believe it, or else...!!!...!!!...!!! Grrrrr!!!
Plus just the way he talks, it almost shattered my screen. I had to hold it so it does not fall down the table the last time I heard him talking about all that "nice, positive user experience for everyone".
"The author of Programmer's Goldmine Collections is a blowhard who should be ignored.
Quote:
Ashley Level 4 7/2/09
"Blowhard? Goodness Phil. I love it!"
That's about the only not exactly the nicest thing Ashley said. Otherwise, she is fine lady, full of dedication to free service, plush she is working here all day long without pay. I think she deserves at least a nice dinner in the fancy restaurant.
Ashley, btw, how was that dinner with your nice husband? Was it nice?
Btw, I have a nice Help button. I think I am going to patent the idea of having a Help button on web pages. Do you think it'd work? I promise to take you and your nice husband to a very nice restaurant, especially if you give me your expert advice on some things. Wink, wink.
Well, looks like I got cold. Sneasing and all that, and I have to push some buttons here and there so all those automatic things work as advertised. Otherwise, you never know, you never know.
What is the best holistic remedy for cold? I heard heating up the red wine with pepper and then drinking half a glass when it is really hot really helps. But some say boil some milk and add a spoonful of butter. What's the latest fashion?
Well, another search does not seem to work as advertised. But we need to take a closer look at it before we make too much noise about it. One thing seems to be clear, if we have to do the searches by chapter (by specifying full path to some chapter articles), that can only futher reduce the number of articles as reported by SERP when you do a site:... search.
And, since the topomost search results (for site:... search) are obviously wrong, and by orders of magnitude, furhter narrowing the search scope by specifying the full path can not possibly produce more articles than the topmost search reports, and that means...
Well, we need to look at that closer, but if this is in line with previous results, then the overall impact would be devastating to the whole net.
And the reason for being forced to do the search by chapter, is because some "experts" stated that it is the proper way to extract the maximum number of articles from the site, however ridiculous it might sound. But fine. We'll get you those numbers also, but I just don't work for free. Sorry, and my rates are kinda steep, especially considering the fact that I had to waste months, seeing the site:... index dropping on my by about 40 times on average, which is TOTALLY off the wall.
And if I have to go through hell for a couple of months, and, on the top of it, get humiliated, insulted and harassed by these "volunteers" from around here, and for DAYS on, and have my perfectly valid articles removed, what do you expect as a result, Dr. Google?
I would really like to see this happen by the opening bell:
I want to see a solution that will allow my users to perform a
search_string site:mfcgoldmine.uuuq.com
type of searchh and what I'd like them to find is the MAXIMAL number of articles that do exist in Google index, and yes, I DO understand that "Google isn't even obliged to index any of my articles, and Google could care less if I exist," AND "Google is ALWAYS "right" and I am ALWAYS "wrong", no matter what, even if black turns out to be white, or vice versa.
In other words, the proposal by one of "experts" from around here to conduct the search via narrowing the scope range, and that is, using subdirectory, in order to INCREASE the number of articles shown, is UTTERLY contradictory to the most rudimentary principles of logic.
That is, the search of type site:my.domain is the possible widest scope for a search on a given site. That means that by specifying a subdirectory as in my.domain/subdirectory can not logically lead to widening the scope of search.
It is like saying that the number of files in subdirectory can ever be larger than the number of files in parent directory, which is utterly wrong in its most basic logical consequences.
And yet, this is EXACTLY what we are seeing here. According to sample searches produced in this very thread, the subdirectories MAY contain the number of articles LARGER than in partent directory, which is UTTERLY and COMPLETELY WRONG, and in the most profound sense there is.
Furhtermore, regardless of ANY conditions whatsoever, the initial search result (the one immediately after pushing search button) can not jump 100 or even 1000 of times lower as we did see in our case, which is evidenced by the screenshots provided.
Furhtermore, this jumping magic was utterly consistent with previous results during SEVERAL DAYS, not hours, not even minutes, which indicates we were not in the middle of transaction, and there was no unwinding or fall-back of any kind. The system can not be unwinding any transaction for several days. That is simply impossible. Otherwise Google would cease to function.
On the top of it, what we see here is normal variations of lower number index as usually happens during normal operation, which indicates that we are definetely no unwinding anything.
On the top of it, what we are seeing is at least one of the sites is not being indexed during last two months, while exact same mirror IS being indexed.
And we are seeing some other not so pleasant things.
Your index can not possibly jump from 3,000,000 to 3,000 within hours, back and forth. Is it oscillating? Between what and what? Why does it oscillate, especially considering the overwhelming tendency to produce the lower number in well over 90% of all cases?
One more time:
Does Google search engine work, or it does not?
Does Google consider this kind of behaviour normal, and if so, would they kindly provide the explanation on a competent level, (if they even care anybody exists in the world to begin with)?
Is search on subdirectory broader in scope than seach on site:...?
What do I tell my users on proper way of doing search? I have over hundred of subdirectories, all parallel to each other. How do I do a search on the site and get true number of articles that do exist in google index as subdirectory searches prove in no uncertain terms?
Whole lot of things to explain, and not to me only. By FAR.
And those people do not understand all this techie talk. That much is clear.
Did you see that opening down gap during the last trading session? The heaviest trade volume was during what time of session? What does THAT tell you, if anything? Because it does tell me some things indeed.
Are you God? Do you think what YOU think is something that defines anything? Something that proves anything beyond a shadow of a doubt? Matters to the very essence of argument?
What was your very first statement on one of the first thread related to these issues?
And what was your second statement?
So, you are proven to be biased, cowardly and destructive on the top of it, as your 3rd statement regarding removal of the thread stated with this "and my hands are eaching now", meaning to delete even THIS thread.
Ok, here is your chance, if THAT is the only kind of "solution" you are capable of producing with all your "holier than thou" assertions.
"Every now and then in the webmaster blogosphere and forums, this issue
comes up: when a webmaster performs a [site:example.com] query on their
website, the number of indexed results differs from what is displayed
in their Sitemaps report in Webmaster Tools. Such a discrepancy may
smell like a bug, but it's actually by design."
It goes on to mention "rough estimate" and it's sometimes very rough. It's not intended as a reconcilable interface and never was. Nor is it ever going to be.
Do you think this answers the question?
Report abuse
And ALL of you, so called volunteers, spending all day long here, "doing it for free" as you claim, have an inherent conflict of interest, and that is, ALL of you are "SEOs", one kind or another, and the LAST thing you are interested in seeing is that webmasters are right and Google turns out to be "wrong". Because, otherwise, how can you milk this forums for customers, if they are not "wrong", and most of your proposals are quite meaningless in the scheme of things, and I have seen plenty of your proposals that do NOT correspond to reality, and that is what I am seeing and experiencing with my own eyes.
One of you told me from the very statement of his: "hey, without 'professional' services, you are out of luck". I did not care of looking at his flavicon then, but I can just guarantee you, he was doing this wink, wink number on me.
So, you spam these forums and post all sorts of things just to promote your own services, for your own benefits, and not for some totally useless "volunteer work" ideology, going as far as engaging in these torturing orgies, just for fun, and then, if you don't like ANYTHING for ANY reason, you can just delete someone's articles and the entire threads.
Isn't THAT the case?
Or you are living manifestations of Jesus Christ, and your sole purpose here is to help those poor webmasters, without getting a penny in return?
Sure, if someone budges and refused to admit to you that HE is screwed up, not you, then he is not a customer, and hey, what is he worth then, if you can not milk him, and what google webmaster forums are good for, if you can not milk it, doing all this "work for free" number?
Who are you kidding with all this "volunteer" stuff?
You have to have at least a grain of honesty.
No wonder, the very first thing that came to your mind is "or may be these articles were stolen from Library of Congress"?
And what did I ask you?
Well, sire, why was the first thing that entered your mind was stealing? Are you a thief? Because any psychologist knows these simple things.
When I describe some specific issue, instead of even looking at it, what do you do instead? Well, try to project some guilt and fear on them, so they get into defensive position. Because once you are in defensive position, you start dissipaing lots of energy, and become vuilnerable to all sorts of "practical suggestions" and "recommendations", one of the first of which is "well, you really do need some COMPETENT SEO to fix your 'problem'", wink, wink.
Isn't how this story goes?
Tell me where am I wrong? Actually, it does not matter. I am done with you. So far, ALL you, personally, was able to produce is ugliest and most destructive things, and it is simply mind boggling Google allows this kind of stuff on these forums, where many webmasters come, some crawling on their knees, because they see their whole operation falling appart before their eyes.
Top Contributor
Webmaster Help Bionic Poster
7/6/09
1
person
says this answers the question:
>>> Furthermore, I would like to mention this: >>> Several threads related to specific issues related to this thread were removed. >>> WHO did a removal?
A caring soul :-)
>>> Why?
Because
you started to insult people in a very aggressive and really ugly
manner, try to do that in real life, if you dare, but don't complain if
you get slapped in the face right away then.
>>> For what exact purpose?
To spare others your disgusting rants.
>>> I would also like to mention the fact that it seems totally inappropriate >>> to allow non Google authorised personnel to be able to remove threads, >>> especially in the context of conflict of interest issues and the very fact >>> that the issues, addressed on these threads lay WAY outside of their >>> level of competence.
My level of competence in recognizing a troll is quite high.
>>> I want to see a solution that will allow my users to find >>> the MAXIMAL number of articles that do exist in Google index
Ah,
finally you've managed to come up with something (took you a hundred
posts and more), that could explain your interest in this. Still your
users won't care a bit about these numbers, they don't need'em ^^
-luzie-
Do you think this answers the question?
Report abuse
It is none of your business what my users care or care not, and how am I going to make it easier to find what they are looking for, and all sorts of other things like that.
Seconly, you are not God, omni-scient, that knows the outcome of this and that. All you have is your own limited view on things, and a biased one at that, and as of necessity biased.
So, keep your ideas to yourself and recommend them to YOUR customers. I am not interested in discussing the customer issues or why would anyone do this thing or that.
I am interested in getting to the bottom of this, and I will, I ALWAYS do. Because I can not simply leave anything incomplete. That is just a nature of a technical mind. Hope you can appreciate that.
If you have some SPECIFIC ideas on SPECIFIC issues, and you ARE competent enough to understand what you are talking about, and that is search engine INTERNALS, then fine. Otherwise, sorry, not interested.
One more thing: this is a thread to discuss MY issues, not yours. Spewing these kinds of ugly garbage is the same thing as hijacking a thread, contributing nothing to equation and simply sucking energy.
Not good for your Karma Marma Dharma Bharati Bati and Pati. Plus you may need to pay your shrink more money at the end. Who knows what might come of it?
Top Contributor
Webmaster Help Bionic Poster
7/6/09
1
person
says this answers the question:
> Ah, finally you've managed to come up with something (took you a
hundred posts and more), that could explain your interest in this.
Yup. Probably told some advertiser somewhere that he's got gazillions of pages indexed and the truth is that it's never been more than a couple of thousand.
And even the ones that are indexed will probably never get displayed in the SERPs - they have to pass duplicate content filtering first.
Do you think this answers the question?
Report abuse
Well, looks to me, the search engine is FULL of bugs, unless something else is going on, that is even worse.
The SERPs behaviour is totally inconsistent in the most observable ways.
The estimated number of articles indexed is totally wrong, to the point where it jumps WILDLY from one page to another in totally unreasonable fashion
These numbers can not jump like this in properly functioning system, no matter how close those estimates are, especially within the short period of time, when the DC updates can not possibly produce such wild variances in such as short period of time.
The original figures, stated with the very first article on this thread are still pretty much the same, within reasonable variations as DC updates are going on.
Nevertheless, these two utterly stable values of 3,000 and 3,000,000 articles are totally unexplaniable, especially the tendency of lower number to come up in well over 90% of all cases and with well over 90% degree of probability.
The results of doing a site:my.domain can not possibly be smaller than while doing a "search_string" site:my.domain under ANY conditions. Just as explained in one of previous posts.
You can not possibly have the results that have already been published above.
No wonder people come here all the time asking "where did my pages go", and they are given ALL sorts of totally off the wall "answers" that do not bear any resemblance to any kind of reality of a properly functioning system.
Sorry, but the conclusions are simply inevitable.
Furthermore, there can not possibly be 296 articles indexed by Google for cppgoldmine.by.ru site.
Search: site:cppgoldmine.by.ru
Results 1 - 10 of about 296 from cppgoldmine.by.ru. (0.08 seconds)
Because, in a single chapter or in a single case of a keyword search the same search engine produces tens of times more results:
Search: site:cppgoldmine.by.ru
Results 1 - 10 of about 1,730 from cppgoldmine.by.ru for threads. (0.17 seconds)
which is 6 times as many as the whole site has, according to previous result.
And this one simply blows all these theories by the "experts" around here to pieces.
Search: thread site:cppgoldmine.by.ru
Results 1 - 10 of about 3,270 from cppgoldmine.by.ru for thread. (0.39 seconds)
Things like these can not possibly happen in a properly functioning system, under ANY conditions whatsoever.
Meanwhile, the conclusions about every single "explanation" provided by these so called experts and the very nature of those "answers" on this very thread are simply inevitable, just as described in:
Furthermore, the number of links to Programmer's Goldmine collections as shown on Webmaster pages is TOTALLY wrong, by hundreds, if not thousands of times.
And, therefore, is "rank", which is currently low for every single site, and, since about the heaviest waited component of the rank is the number of backlinks, just as original Stanford research paper on Google search engine architecture shows, the positioning of these collections are totally and completely wrong, and I am willing to stand in court of any land to testify that, need be.
Top Contributor
Webmaster Help Bionic Poster
7/8/09
2
people
say this answers the question:
:rolls up sleeves: :cracks knuckles: :grins:
I'm so gonna love this - soooo much, that I'm going to take my time replying. I've going to savour every single moment as I pound you as you deserve. Lets see you immortalise this your * muppet.
.
1) This is Not a representation of Google. This is a fair representation of the Volunteers that help/support hudnreds/thousands of Genuine people with Real problems - responding to a jackass without a clue. Further - what the OP has Not told you about is the various Other Topics they created - in which they were rude, abusive and nasty to other members of this group.
2) I very much doubt that anyone from Google can touch this topic - as it's simply a Flame. The closest a Google Employee has come is to select a previous answer (aropund 1/2 way in this topic now) as Best Answer.
3) Numerous people have still gone to the effort of attempting to help the OP uderstand the various aspects/issues invlved - but the OP (as you can see) has blatantly ignored it - ALL of it! The OP is NOT interested in realising they are incoorect (wRONG) in their summation of how things work - nor in thge fact that they are misunderstanding and making assumptions. Worse than that though - the OP refuses to accept that their method of Searching for the comparitive data is very wrong. (Those who have managed to read through the topic will of course have realised that the OP is a complete * * without a clue, despite all their protestations)
That's right people - not only is hte OP a completely offensive individual who, though they strongly claim otherwise, doesn't havea clue about how the data is handled, who refuses to accept that the way they search is wrong, that cannot acknowledge that they have been given help/assistance and are jsut to stupid to accept the simple fact that they are wrong ... ... also has Duplciate contnet - which we ALL know causes Filtering and thus results in the SERPs figures being even more out of whack than they usually are.
Then afterwards - I expect to see an apology on there to a) Google for misrepresenting them (Bad things to be doing! They have way more money than you do sunshine!) b) all the people you verbally abused and offended in the other topics you decided not to tell people about c) Your readers - for making them trawl through the ravings of a stupid troll without a clue and a bad attitude.
Do NOT mess with the Wookie :D
Do you think this answers the question?
Report abuse
Top Contributor
Webmaster Help Bionic Poster
7/8/09
And yes dear readers - I too have obtained a full copy of the entire topic. This has been done as we can all tell that the OP is is likely to ditch and burn.
Do you think this answers the question?
Report abuse
Top Contributor
Webmaster Help Bionic Poster
7/8/09
I also strongly suggest that you go and Block that copy of yours. As it currently stands - you have an exact Dupicate (is that a habit?) and technically - that could also be seen as a breach of copyright.
So Noindex/robots.txt to prevent the bots from crawling/indexing it and putting in some content explaining that you have made an unauthorised copy and for what purpose - places at the top and bottom.
(Anyone know if Google does the DMCA's? If so - where can I read them - as I'd love to watch this one get slapped with a DMCA from Google - that would just be perfect .... or should we simply file SpamReports for stealing content? :D)
Do you think this answers the question?
Report abuse
Top Contributor
Webmaster Help Bionic Poster
7/8/09
1
person
says this answers the question:
pgelqued,
I've gone through the labour of reading the great article you've written about your deleted thread "Google index drops like a rock" and am quite sad now that you didn't mention me there :-( Would you please include some insults against luzie too, I need that for my reputation.
Furthermore I found an error in your article I'd have to point out. You write a TC can "ban anyone he likes" ... unfortunately this is not true (yet) - as your presence here proves - otherwise I had banned you long ago.
And now back to the topic. You've finally given away the true motive for everything you've said so far, look at this:
... the positioning of these collections are totally and completely wrong,
Hooray! Your sites rank badly? That's all the whole fuss has been about? You need a hundred posts in half a dozen threads to tell me your sites don't rank as you expect them to do? I've seen thousands of posters doing that in a single sentence.
... and I am willing to stand in court of any land to testify that, need be.
No-no, there's no need for that, really.
-luzie-
Do you think this answers the question?
Report abuse
Do you want to get famous also and be known to cream of the crop programmers worldwide? Fine, produce something equally as deserving as Autocrat does.
As far as "duplicate content" on a help page. Yes, the same exact page is used for all the sites and mirrors, and it is quite fine Google indexes only one copy of it. Not a problem at all.
As far, as ANY kind of copyright issues anyone might have, fine, do as you please. Not a problem at all. To the best of my knowledge, there is do copyright issues related to these collections, and if there are any, ask Google and see what they tell you.
How unfortunate that you can not ban anyone <b>YET</b>, and if what you are saying is true, fine, that correction will be reflected in the latest version of
and all the copies of it that exist on all the sites.
Btw, if you find some other issues you think are incorrect, fine, we can look at that.
One more thing: can you guys tell me what is the nature of relationship of "volunteers" and Google as far as any kinds of contracts or agreements go, written or otherwise?
Sorry, your performance is not impressive enough to warrant any kind of response at the moment. Everything has been covered either in this very thread, or other threads that address the issues discussed in this thread.
I think your picture on your favicon is much more impressive. Btw, was that YOU in your better days?
As to me being as "wrong" as you describe, I think we already addressed every single argument you had, point by point. One thing to keep in mind is that when I say "Not applicable to this case", well, that is exactly what it means and it is not a rocket science to see why exactly it is not applicable to this particular situation, no matter how big is that pile of confusion you are trying to make out of it, by adding more and more variables into equation, not even realizing they do not correlate to the sample data provided. You see, if DC updates or the entire operation of various search engine sybsystems were as confusing, complex, contradictory, etc., as you describe here, the whole search engine would cease to operate, even as bad as it does at this very junction.
What I do have to admit is Google did achieve some spectacular results in terms of response time to the search queries, considering the size of various databases. Beyond that, sorry, does not impress me personally. By far.
As far as who insults and harasses who, it is all on record. So as you please.
And no, none of your "answers" actually answer anything as far as the exact and specific issues discussed on this thread go. Not a single statement of yours proves anything.
Furthermore, you guys seem to have this tendency of always drifting off topic, and slide into domain of insults and ugly concoctions. Understood, by now it is a habit for you. As far as I know, all the people that have the authority to abuse things, such as deleting threads, do abuse them.
"Power corrupts, and absolute power corrupts absolutely". Not sure if you even know who said that or even heard this in your lives.
As far as I can see, there is no harm in any thread to exist, unless it clearly violates some law and places LEGAL responsibilities on Google to remove it.
Otherwise, it is just a different view of the issues and you never know who and for what reason would like to be able to see such information. Because as it stands NOW, what you are doing by deleting someone else's articles, and even threads, is what is called a destruction of evidence.
Are you trying to cover up the tracks, so no one could see what actually took place on those threads you have destroyed, by ANY chance? Because I WOULD like to see all those threads you have destroyed and make my own opinion about it, and I have a feeling that reading those threads would provide me much more specific technical data to make a better conclusion as to what is exactly going on in my own situation.
Sorry, I do not mean to be insulting, but what I do find is that not a single response I have seen from any of you, "experts" so far, does actually correspond to exact issues we discuss here.
If ANY of you have specific issues you think I did not either prove in sufficient detail, or interpreted incorrectly, please do point them out and I promise to give you as detailed of a reasoning as you might see ANY place on this planet, short of talking to top architects at Google.
Otherwise, there were so many various statements made by you, "top contributors", and so few of them had actually to do with the exact issues and hard data presented in the original thread, that it would be totally unproductive for me to waste hours discussing some points that are so obviously wrong in my opinion, that there is nothing to even talk about. It is like saying 2 plus 2 is not equal to 4 to me.
Interestingly enough, one IBM researcher stated:
"One thing is certain, 2 plus 2 is NEVER equals to 4"!
Wow! How do you like THAT one for breakfast? And you know what? Well, strangely enough, that was the foundation of the Fractal Theory more or less, and now the leading 3D related software is using those principles in the most advanced 3D rendering and animation software, and all the effects you are seeing in the movies in one way or other are the result of that original work.
So, be my guest and present the specific examples of your answers and I promise to give you a detailed reasoning, unless they are so off the wall, that I would not even consider any of it in my wildest dreams.
Now - take your time, (and if you have problems following this, jsut let me know). We are going to break those into 2 parts. We wil lbreak them down into Domain and Path Thus we will have
So we have 2 Domains, and 2 Paths. Still keeping up with me here? Okay ... I'll wait whislt you stumble along. [wait for muppet to catchup with the whole 2 steps I've jsut taken] [still waiting] [ah ... finally]
So, we have 2 Different Domains .... but wait - what's that .... the Paths look the same! Gosh!
Surely not though. Must just be a coincidence. Sure you just happen to have used simialr Paths - and have different content, right? I know - to be fair - we will do some "sampling". That means we will pick out certain bits from one, and compare them against the related bit another. Oaky - I see that confused you. In baby speak - I will look at 1 site, and look at the first link ... then I will look at the 2nd site, and look at the first link there - then we can see if they are different. Make a little mroe sense now? Good. Then we can repeat it ... lets say we do 1, 5, 10, 15 and 20, yes? Won't this be fun. Now hold my hand like a good boy, and lets go look.
http://cppgoldmine.uuuq.com/Convert/Articles/Abstract_Class_Code/index.html 1 - 1st Link) Links in C++ Abstract Class Code articles 2 - 5th Link) '*' cannot appear in a constant-expression problem 3 - 10th Link) [Q] Strange dynamic_cast problem 4 - 15th Link) A thinking about implement with interface and analysis problem by set theory. 5 - 20th Link) abstract base class containing class
http://cppgoldmine.by.ru/Convert/Articles/Abstract_Class_Code/index.html 1 - 1st Link) Links in C++ Abstract Class Code articles 2 - 5th Link) '*' cannot appear in a constant-expression problem 3 - 10th Link) [Q] Strange dynamic_cast problem 4 - 15th Link) A thinking about implement with interface and analysis problem by set theory. 5 - 20th Link) abstract base class containing class
Oh no - they are the same!
But wait - okay - maybe you are just very Very anally retentive! Maybe you just utilise the same structure and labelling system. (That means you call All your teddy bears Rollo)
Oh dear - it appears you have been a complete and utter * muppet doesn't it. Yes you have.
Silly * ...pgelqued... has got the exact same content on 2 different sites. What does that mean? Well - how do I explain this to a child? Hmmmm..... Lets say you have 2 copies of the same book. Thats okay - we know that if we pick up one of those books - it will be thesame as the other book, yes? Now, imagine how confused you would get (doing anything!) if one of those books had a different cover on it! Yes indeed - you would get confused wouldn't you! Well, you have done something very similar here. You have 2 versions of the same thing - but called the mdifferent names.
But that is not all you have problems with is it ...pgelqued... ? No - copies of the same thing between your own sites just wasn't enough was it? No.
Thats right, ...pgelqued... the * * muppet of a * is honestly expecting to rank for content that is Not only shown on 2 different sites that he owns - it also appears on numerous other sites!
.
So ontop of have a seriously questionable "formula", So as well as not including several factors, So including using the wrong search criteria, You are ALSO duplciating content across mutually owned sites and replicating content as found on other sites.
.
Go on - keep argueing. Everyone can see you have Not got a clue, and are jsut argueing as a rearguard in an attempt to stop yourself looking like a complete and utter * imbecile.
Hate to tell you this sunshine - you're a little too late.
Do you think this answers the question?
Report abuse
Top Contributor
Webmaster Help Bionic Poster
7/8/09
1
person
says this answers the question:
As to your question regarding "... what is the nature of relationship of "volunteers"and Google as far as any kinds of contracts or agreements go, written or otherwise? ..."
There isn't really any. We show up. We try to help out. Over time, you get "points". Over time, exemplary behaviour and showing a record of knowing what you are on about, you "may" get asked if you would like to be a Top Contributor.
That's it.
Do you think this answers the question?
Report abuse
Just relax. This isn't exactly the end of the world. This issue HAS been discussed already and I did mention it more than once that these are EXACT same sites. Every single site has at least one mirror, and that is exactly as I want it, except it wouldn't be a bad idea to add more mirrors.
This is used for reliability purpose. EXCLUSIVELY. There is no benefit whatsoever from having a mirror. Because when user finds some article, it does not matter which mirror he is accessing. The total page impression count is not going to change.
If anything, Google has to make sure it can handle thse kinds of situations without penalising the mirrors all over the world.
Secondly, this is the way it has been for almost a year, and there has never been any kind of problem as far as sites being properly indexed by Google. This is a recent phenomenon.
Google can EASILY determine that these are in fact mirrors and there is no trick of any kind used to bump up anything.
Furthermore, ALL these sites were created via the same exact account on webmaster pages. Nothing was done to even attempt to "cover up the tracks". The simpliest domain name analisis shows that these are exact same sites. The index pages are exactly the same, and so is all chapter index pages for every single chapter, and so is every single article in each of these sites (assuming we are talking about the same collection) is exactly the same.
And if you think THIS could be considered as some kind of "evil", then what is going to happen is some of the biggest depositories of information in the world are going to just dissapear from Google index, and the consequences of it one can not even BEGIN to comprehend.
In summary, there is absolutely nothing "wrong" with having the mirror sites. Google has been given so many clues as to what they are, that it is the simpliest thing in the world to make a proper adjustments in Google index.
Furthermore, Google ALREADY has provisions to handle this kinds of situation via "Similar pages" links that are indented.
What happens if someone does a search and clicks on a link and that site is off line for all sorts of reasons? Well, then they have an option on clicking on the mirror link if he thinks this article has a potential to answer his question.
But ALL of this has absolutely nothing to do with the issues on the table. Again: the issue is how come Google SERPs show WILDLY different results, as much as by 1000 times for the same exact search under same exact conditions, from same exact machine? Just by clicking on Search button, you may get either 3,000 articles or 3,000,000. How THAT kind of thing is possible?
And how come those results are stable over the period of weeks withing the reasonable and expected normal variation as various DCs get updated continuously?
As to the "questionable formula", you need to keep things in perspective. We are talking about the SAME exact conditions and the same exact search. And I insist: the results should NEVER, under ANY conditions, vary so wildly in a properly functioning system. REGARDLESS of what kind of DC update is going on, and what is the views from various DCs as to the same exact site. Yes, there ARE some normal variations. Because all DCs may have a different data at exact same time and are not synchronized sufficiently. But that has been taken into account in my analysis.
Simple as that.
Finally, this "sunshine" language is clearly offensive, just like every single response of yours. It is getting kinda boring. Can you come up with something more creative?
One more time: to this very moment, you were unable to present a SINGLE argument, that has ANYTHING to do with the exact problem we are talking about.
We are not discussing the copyright issues on this thread. If ANYONE has ANY copyright related issues with these sites, they know what to do, better than you.
Do YOU have a problem with copyright issues?
Why don't you listen carefully what is being said?
One more time: TALK TO GOOGLE ABOUT THESE COPYRIGHT ISSUES AND SEE WHAT THEY TELL YOU. And that is if you can even manage to talk to Google to begin with.
Thank you for answering my question regarding relationship to Google.
That means to me that you are not authorised to make ANY kinds of presentations on behalf of Google, its search engine mechanics, or search engine internals.
ALL your opinions is just that, opinions, based on rumors.
Otherwise, I could just look the exact description from some Google page that describes the mechanics in no uncertain and very specific terms.
Why did you have to waste all this time doing the most useless thing possible, and that is trying to "prove" that these are mirrors? This has been already discussed, and not only once.
Secondly, it has absolutely nothing to do with the exact issues on the table, and claims, supported by the hardest evidence there is, the screenshots, at least as my ability to access the Google internal information and databases.
I do not deny that you have plenty of useful experience and know all sorts of tricks that are useful from SEO standpoint, and I am not questioning any of it, or trying to discredit the value of what you do, regardless of what your underlying motives are.
I am not saying you are just an idiot that does not have a clue. At the same time, I do not see you even looking at the data from appropriate angle, nor you truly understand what you are saying as far as search engines mechanics go.
Without bragging too much as to how "great I am", I can tell you this much:
I do understand what I am saying, and I do understand what means distributed systems, synchronization issues, database theory, transaction processing, the nature of information and search engine internals, the principles of file system designs, and PLENTY of other things that ARE applicable to the exact issues we are discussing here, down to the nitty gritty of it.
Just to put a fine touch to all this "you are just dumb" things you are constantly blabbering, I can tell you this: I consulted some of the biggest names in the industry, such as HP, Intel, SGI (Silicon Graphics), Amdahl, Fujitsu (HAL Computers), Lynx and a few others on a contract basis, doing the kernel level development work and being paid the money you can only hope to ever make, and all the projects I did for them, were done on time and with the best results possible, and in MOST cases, results that exceed their normal expected results several times over.
Hope that rings at least some bells in your cockpit.
Just to put a fine touch to all this "you are just dumb" things you are constantly blabbering, I can tell you this: I consulted some of the biggest names in the industry, such as HP, Intel, SGI (Silicon Graphics), Amdahl, Fujitsu (HAL Computers), Lynx and a few others on a contract basis, doing the kernel level development work and being paid the money you can only hope to ever make, and all the projects I did for them, were done on time and with the best results possible, and in MOST cases, results that exceed their normal expected results several times over.
Hope that rings at least some bells in your cockpit.
Top Contributor
Webmaster Help Bionic Poster
7/8/09
1
person
says this answers the question:
Jsut keep on going. At the end of the day - this tpi ranks for your sites. Everyone can see what you are like, and will realise that you re clueless.
If your intent was to harrange Gogoel and make a name for your self, congratulations - you partiall succeeded. Rather than causing Google a nightmare - you got us. Yet you did make a name for yourself - and it's quite lng and unpleasant.
You are a laughing stock. You complete inability to understand all the above points that can/do influence the results seen, and the various incorrect assumptions show you as nothing more that an idiot.
so please - keep it up. It's your reputation, your sites etc.
None of us care.
I've been the only one to make a serious effort ot assis t you - and I'm tired of it. So now you can sit and suffer. Watch your results fluctuate, vary, and Dwindle (as they will!) over time.
I know I'm going to enjoy watching it :D
Do you think this answers the question?
Report abuse
Top Contributor
Webmaster Help Bionic Poster
7/8/09
1
person
says this answers the question:
Sorry buddy, but that's NOT how you are supposed to implement a distributed system.
You don't allow multiple sites to be indexed for the same thing.
One site only will be indexed and you direct traffic by a proper load balancing method to whichever server happens to be available, less busy, etc, in a totally transparent manner to the visitor (be it human or robot).
Of course this is not a cheap method. It's not at the disposal of every Tom, Dick and Harry to use. I cannot afford it. If my server is down or too busy, I can only wait until it's sorted. I don't have the resources to go fancy. But I would never set up mirror sites and let them all get indexed, this is idiotic in the extreme. You're cutting your own branch by introducing so much superfluous material that all of it will be trashed to varying degrees. Think quality vs quantity. Think long and hard about that.
And this doesn't even begin to address the other problem, that your published content even if it existed on only ONE of your sites, ALSO exists verbatim on may other sites on the web which are not controlled by you (if they were this will just add to the same issue as your mirror sites have). You are aggregating content from many sources. This therefore means it's not original (not original to your site), not unique and ultimately doesn't deserve to be indexed in the minutest detail and even less to rank at all.
Do you think this answers the question?
Report abuse
But there is a pretty interesting twist on all this.
The issue is pub domain information. Who, any by what criteria, is to be included in the index if ALL of them have the same articles? Well, according to Google, the biggest and the baddest. But do they "own" ANY of it?
Secondly, consider this: Most, if no all the sites I am aware of that cary the same exact information, simply have it in bulk. Searching THAT database will produce pretty poor performance in terms of appropriateness of some article to the search query, and the very quality of the articles that show up in some search if some particular site has priority over all others.
For example, if you are interested in "abstract class" issue and would like to see some specific code examples describing some pretty subtle points, when you do a search, you may get ALL sorts of articles on abstract class, and about 99% of them will be totally of the base as to the original query.
But... If you do not simply include the entire historical archive on the site, but filter the information with sophisticated filters, what YOU are going to have as a result is ORDERS of magnitude more precise and to the point information than anyone else, including Google, Yahoo and Microsoft.
But... You may never even get to be indexed if Google considers it a "dupicate content", despite the fact that you have the best collection of most on topic articles with the most extensive collection of code examples on ANY issue conceivable that exist on the face of the planet Earth at this particular junction.
And I can tell you more: Google itself can not possibly produce anything equally as precise, no matter what they do. Because their filtering technology is totally outdated, and in the most fundamental principles. They use methods and techniques that were known since the sixties of the previous century. Yes, they did improve some things, but in incremental terms. They can not possibly find the information we can find, and for them to be able to do that, they would have to spend several years on totally changing the entire architecture. Because very little of their current technology is capable of utilizing these principles, which, for obvious reasons, I am not going to go into details discussing it.
So...
What IS the benefit to current view on information? WHO is to be shown in the search results and according to what criteria and principles?
Yes, this issue does have relevance and there is no easy solution, even from logical standpoint, and yes, there is no point of including every single copy of the same article. Otherwise, the users will simply be overwhelmed.
BUT...
This is not the end of equasion. It is just a beginning of it. Just take for example the simpliest issue of article formatting. If you look at the biggest players in the world, their article formatting is not necessarily the best you can find. With their current formatting, it is much more difficult to see who says what and who responds to whom, etc. Plus, just the pure formatting goes, they just copy the original article with all those weird special characters to the output page. If you look at exact same article in my collections, the difference is night and day.
And again, if you do a search on code examples for abstract class in Microsoft or Google database, you are going to get hundreds of thousands of articles, majority of which does not even have to do with abstract classes as such. Because their very technology is not capable of doing some fine and sophbisticated filtering.
The very traffic on the goldmine sites is a living indication of validity of such approach. Penalising these new technologies using the methodology of dealing with crooks, can hardly produce the benefits.
Btw, I saw your article with recommendations on things to do to make sure you are going to be treated fairly on google search engine, and I find this article one of the best I saw, including all the information Google provides on various pages, which has very little practical value and too vague to even pay attention to.
And the way your ifnormation was presented is exactly point by point. A very good article, and in the context of utterly useless Google documentation it does shine indeex.
Wow, so Microsoft is doing searches on MFC using mfcgoldmine.uuuq.com and doing the chapter search directly on some pretty fancy things? Not only that, but reviewing several chapters? And Intel and AMD also?
Hey, now we are talking!
Btw, dear Microsoft, if you ever see this post, I see what you are looking for. We'll make a new chapter with this kind of issue. Did not think of that one. But you can still find most of things via CDialog chapter, just as you are doing.
You know how to find me, right? Just go to the help page, it has all you need.
While Google is scratching its head, meanwhile, I have made you a couple of chapters on exactly what you were looking for with code examples and expert articles.
I bet you this is the best stuff on the net on these specific issues. All nicely sorted, organized, formatted better than you and Google can do and squicky clean. Check it out.
Command bar code chapter: http://mfcgoldmine.uuuq.com/Convert/Articles/CommandBar_Code/index.html
Command bar experts chapter: http://mfcgoldmine.uuuq.com/Convert/Articles/CommandBar_Experts/index.html
Plus I am going to make 4 more chapters on toolbar and status bar.
We can beat that big G with hands down. Right now, we are only on 4th order, we can go 5 orders higher right out of the box, which will allow you to find a needle, not just in a haystack, but on the planet Moon. :--}
Sorry for the inconvenience, but the chapter names have been changed to be in compliance with MFC class naming convention. So, those chapters do not exist. Following are the correct and latest versions:
Command bar code chapter: http://mfcgoldmine.uuuq.com/Convert/Articles/CCommandBar_Code/index.html
Command bar experts chapter: http://mfcgoldmine.uuuq.com/Convert/Articles/CCommandBar_Experts/index.html
Status bar and menu chapters are coming up shortly.
So... Now you can do any kinds of control, status or any bars your soul desires in minutes. No need to waste days, trying to dig up some code examples or trying to find some competent opinion on this stuff any longer. Plus you get several alternative approaches and all possible views on that by the top guns.
Where are those lazy developers? Wakem up! Things are happening in a hurry now.
This place is no fun, I tellya. You are WAY too slow, dear developers. And if I say you got bugs, you bet you do. Upto you know where, if they are just simple innocent bugs... Oh, you think they are not that "critical"? How do you know? Why don't you please essplain how this kind of magic is possible in ANY computation theory there is?
And I bet you 1 to 10 you can't. And it is going to take you at least 3 years to get to where we are already, considering the speed with which you solve the problems and the type of bugs you have in your code, and I mean ALL OVER.
Just look at your ugly formatting of Usenet articles. Do you even begin to realise that not a single experience guy is using Google groups because it is so screwed up to the point of being unuseable? Did you even bother to ask the top guns in software or any other professional field what do they think about the way you format the articles and your user interface? Or you could care less just like in this particular case? You think YOU have to cream of the crop developers, right? WRONG!!!
This is the latest and greatest version of google, the NWO style, and the reason we are seeing what we are seeing because there are bugs in it, and not one. We were not supposed to see those real numbers. ALL we were supposed to see was the gradual degradation of google index, that would immediately be "explained" by these "Top Contributors" and the whole nine yards of "explanations" would be provided to keep you chasing your own tail for months, if not years.
That index was supposed to gradually degrade and settle at MUCH lower number in about two months, just as we see here, and it all would look like there is a new kid on the block, and he is so cool, that he could overshadow even the sites that are consistently in the 1st page on MOST of the keywords. Isn't THAT how this story goes?
Just read the other threads telling you pretty much the same thing only in slightly different permutation. "Look I have a site that was consistently in #1 position for ten years, and now BOOM!", and on and on, and on. Not nice, Dr. Google.
But...
Sorry, but it did not quite work out as expected.
So...
I bet you google will not have any explanation for this completely off the wall behaviour, which is evidenced in the hardest possible way under the circumstances, the screenshots. Futhermore, it is even reproducible, and THAT is a total disaster for those masters of disasters.
They will just tell you something like "oh, that is nothing. It does not affect anything, or something of this sort". Oh yeah? How much are you going to bet on it?