Executive Summary

Question

I just found http://programmingfaq.w3ec.com/faq/4761/whats-the-hi-lo-algorithm which has an exact copy of this Stack Overflow question, What's the Hi/Lo Algorithm ^[1], with all its answers and no difference in a single character. There is no reference to Stack Overflow.

Is this legal, or at least tolerated? Is this impertinent theft of information? I just can't see any sense in copying whole pages, particularly when not referencing the source.

[1] http://stackoverflow.com/questions/282099/whats-the-hi-lo-algorithm

Answer 1

Stack Overflow is licenced under Attribution-Share Alike 3.0 Generic ^[1], which states:

Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

I think it is pretty clear that they have failed to do this (or any attribution whatsoever). So IMO (and IANAL) no: this usage is not legitimate. But within the terms of the cc-wiki agreement cited re-use is fine.

Edit: the cc-wiki licensing and attribution policy ^[2] are also linked on every footer page like so.

Enter image description here

If you click through to the attribution policy ^[3] you will find the specifics:

So let me clarify what we mean by attribution. If you republish this content, we require that you:

Visually indicate that the content is from Stack Overflow, Meta Stack Overflow, Server Fault, or Super User in some way. It doesn’t have to be obnoxious; a discreet text blurb is fine.

Hyperlink directly to the original question on the source site (e.g., http://stackoverflow.com/questions/12345)

Show the author names for every question and answer

Hyperlink each author name directly back to their user profile page on the source site (e.g., http://stackoverflow.com/users/12345/username)

By “directly”, I mean each hyperlink must point directly to our domain in standard HTML visible even with JavaScript disabled, and not use a TinyURL ^[4] URL or any other form of obfuscation or redirection. Furthermore, the links must not be nofollowed ^[5].

[1] http://creativecommons.org/licenses/by-sa/3.0/
[2] http://blog.stackoverflow.com/2009/06/attribution-required/
[3] http://blog.stackoverflow.com/2009/06/attribution-required/
[4] http://en.wikipedia.org/wiki/TinyURL
[5] http://googleblog.blogspot.com/2005/01/preventing-comment-spam.html

Answer 2

I do believe the problem is with the following paragraph:

Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

(emphasis is mine)

What's this manner specified? I think this is too subjective. How can you just point at people and say they are ilegal using your content if you didn't specified what exactly this "attribution" means?

The only complete reference about this subject that I found is within an official blog post ^[1] (it even has the website you're concerned about as an example).

I do believe it would made no harm a simple url below the cc-wiki image in SO footer, named "Attribution Guidelines" that are contained in this post ^[2]. Doing this way people have no excuse of "misunderstanding" attribution guidelines since you explicity said what you need to to when using SO content.

[1] http://blog.stackoverflow.com/2009/06/attribution-required/
[2] http://blog.stackoverflow.com/2009/06/attribution-required/

Answer 3

I am not a lawyer...but to me the site clearly violates the terms of use:

there is no link to Stack Overflow on the referenced page
there is no link to the original question
there is no attribution of the author of the question
there is no attribution of the author of the answers

The last two are serious violations of the spirit and letter, and I think that persecution would be warranted.

Unfortunately, the registrant ^[1] is in China, but godaddy.com will probably cut them off at the knees if @[Jeff Atwood] requests it.

[1] http://who.godaddy.com/WhoIs.aspx?domain=w3ec.com&prog_id=godaddy

Answer 4

This seems a bit worrying:

Searching Google for my one and only (woohoo!) Stackoverflow question using unambiguous terms returns the top result as a tuts9.com copy of the SO page. Also clicking this result link gives me a 403/forbidden page from tuts9.com (Google cache page confirms this page is my SO question even though they filed it as a VB/VB.NET question!).

My original stackoverflow question is here:
http://stackoverflow.com/questions/3251191/how-to-prevent-non-repeatable-query-results-using-persistence-api-in-java-se

Google.com results for "how to prevent non repeatable query results using persistence api in java se" :
http://www.google.co.uk/search?hl=en&q=%22how+to+prevent+non+repeatable+query+results+using+persistence+api+in+java+se%22 ^[1]

The SO page is nowhere to be seen instead the top result from Google is:
http://tuts9.com/questions/25608/how-to-prevent-non-repeatable-query-results-using-persistence-api-in-java-se

Looks like Google are eliminating the SO page from the results as if it were spam, instead giving the tuts9.com copy the 'top slot'. Are Google being gamed here? I've sent a feedback message to Google raising this issue (through their web form though so I don't know how much attention it will get). Maybe someone from SO contacting Google directly would have more effect.

Update (2010/09/02): Currently the top 2 Google results from the search above are for tuts9.com followed by this question on meta! The original SO question is omitted for being too similar (too similar to tuts9.com copies? lol). When similar results are included the SO question appears in page 2 of results.

[1] http://www.google.co.uk/search?hl=en&q=%22how+to+prevent+non+repeatable+query+results+using+persistence+api+in+java+se%22

Answer 5

I just found out about this thread while googling our site...

From what I understand from CC-by-SA ^[1] we comply with everything the license permits.

We ALWAYS provide a link back to the original content and a link to the source site's homepage. We always state that we don't own any of the content and state our sources. We just want to help people find good articles. We are a meta-search engine.

We never thought that we would have that much traffic. In the next few weeks, we will reduce the number of articles coming from Stack Overflow and Server Fault and will have parterships with good publishers in order to have great blog articles.

Things move so fast on the internet that we have gotten caught in a big spiral. That's why we need ads in order to have access to better servers and invest time into the site to make it better.

Our goal is to be a good meta-search engine and blog mashup. We need good articles, so do not hesitate to contact us if you want your site included.

[1] http://creativecommons.org/licenses/by-sa/2.5/

Answer 6

answermoz.com is copying questions without attribution. The copy of a question at answermoz.com appears in Google searches ahead of results from money.SE (try searching on the question titles below.)

Example 1
http://money.stackexchange.com/questions/3461/what-differentiates-index-funds-and-etfs
http://answermoz.com/what-differentiates-index-funds-and-etfs/

Example 2
http://money.stackexchange.com/questions/3118/is-my-credit-score-of-766-lower-than-it-should-be
http://answermoz.com/is-my-credit-score-of-766-lower-than-it-should-be/

Answer 7

Hi folks. I found this answer today:

http://www.go4answers.com/Example/non-nullable-columns-db-becomes-101286.aspx

It looks very much like a Stack Overflow question, but it could be it has nothing to do with Stack Overflow. I thought I might just pop something quickly, to-be-sure (/said in an Irish way, with no offence to our Irish friends).

Answer 8

It seems google has created this tool to address and combat just this problem: Personal blocklist ^[1]

The personal blocklist extension will transmit to Google the patterns that you choose to block. When you choose to block or unblock a pattern, the extension will also transmit to Google the URL of the web page on which the blocked or unblocked search results are displayed. You agree that Google may freely use this information to improve our products and services.

I guess if you want to combat this thing, the easy thing to do is to use this extension. Many people blocking a site is sure to get their attention, thereby removing it from search results, thereby solving most of the problem.

[1] https://chrome.google.com/webstore/detail/nolijncfnkgaikbjbdaogikpmpbdcdef

Answer 9

What about stackmobile? They say they are not affiliated with stackoverflow, so it's clearly not an official mobile version of SO.

They scrape all SO sites, including all data about users, badges, etc., yet they don't provide direct link to questions nor do they provide direct links to profiles on SO sites.

They also don't mention creative commons license anywhere on their site.

Example: http://stackmobile.com/view_question.php?site=stackoverflow&id=4587642 ^[1]

Is it OK for them to scrape SO sites like that?

[1] http://stackmobile.com/view_question.php?site=stackoverflow&id=4587642

Answer 10

Disclaimer: I am not a lawyer.

Executive Summary

If you want to post content from a single answer or a single question, you need to contact the person who posted it as they own the rights to that content.

If you want to post a question/answer combination then you have to follow the terms written in the Stack Exchange Terms of Service.

Individual submissions ('posts') to the Stack Exchange Network (SE) are owned by the person who created them. Individual submissions are licensed only to SE and any reposting elsewhere by non-SE entities, regardless of attribution, are grounds for issuing a DMCA takedown notice.

Posts are licensed to SE to use in a Collection/Collective Work which is a new creation that SE holds the rights to. If a Collection/Collective Work (for instance a question and answer combination) is used without following the attribution rules from the SE Terms of Service (TOS), then SE is able to issue a DMCA takedown notice for that content.

General Overview of the License

Content is owned by the content creator. The content creator licenses that content to SE under the terms of the Stack Exchange TOS ^[1] with the following restrictions:

SE can use the content in all sorts of ways even if the Subscriber decide to delete the content later¹, including the right to claim copyright over the entire content of the Stack Exchange Network as a collective work/compilation²
The Subscriber is responsible for making sure SE can use the content in all those ways³

^{1: "You grant Stack Exchange the perpetual and irrevocable right and license to use, copy, cache, publish, display, distribute, modify, create derivative works and store such Subscriber Content and to allow others to do so in any medium now known or hereinafter developed (“Content License”) in order to provide the Services, even if such Subscriber Content has been contributed and subsequently removed by You."}

^{2: "The Network is protected by copyright as a collective work and/or compilation, pursuant to U.S. copyright laws, international conventions, and other copyright laws."}

^{3: "Subscriber warrants, represents and agrees Subscriber has the right to grant Stack Exchange and the Network the rights set forth above."}

SE combines questions and answers creating a Collection or a Collective Work which they own the rights to. They own the rights to that Collection and re-license it with the following restrictions:

In the event that You post or otherwise use Subscriber Content outside of the Network or Services, whether such Subscriber Content was created by You or others, You agree that You will follow the attribution rules of the Creative Commons Attribution Share Alike license as follows:

a) You will ensure that any such use of Subscriber Content visually displays or otherwise indicates the source of the Subscriber Content as coming from the Stack Exchange Network. This requirement is satisfied with a discreet text blurb, or some other unobtrusive but clear visual indication.

b) You will ensure that any such Internet use of Subscriber Content includes a hyperlink directly to the original question on the source site on the Network (e.g., http://stackoverflow.com/questions/12345)

c) You will ensure that any such use of Subscriber Content visually display or otherwise clearly indicate the author names for every question and answer so used.

d) You will ensure that any such Internet use of Subscriber Content Hyperlink each author name directly back to his or her user profile page on the source site on the Network (e.g., http://stackoverflow.com/users/12345/username), directly to the Stack Exchange domain, in standard HTML (i.e. not through a Tinyurl or other such indirect hyperlink, form of obfuscation or redirection), without any “nofollow” command or any other such means of avoiding detection by search engines, and visible even with JavaScript disabled.

^{Note: There is a question over whether or not
the

rel:nofollow

is a violation of the CC-BY-SA license or not
^[2]}

So Who Owns the Content?

Despite not owning the copyright to Subscriber Content, SE does own the right to the content as a Collection/Collective Work. As stated on copyright.gov ^[3]:

Under the present copyright law, the copyright in a separate contribution to a published collective work such as a periodical is distinct from the copyright in the collective work as a whole. In the absence of an express transfer from the author of the individual article, the copyright owner in the collective work is presumed to have acquired only the privilege of using the contribution in the collective work and in subsequent revisions and later editions of the collective work.

^{This is a normal English explanation of
US Title 17 Sections 102-103
^[4]}

This is also covered under the definition of Collection in the CC-BY-SA license ^[5]:

"Collection" means a collection of literary or artistic works, such as encyclopedias and anthologies, or performances, phonograms or broadcasts, or other works or subject matter other than works listed in Section 1(f) below, which, by reason of the selection and arrangement of their contents, constitute intellectual creations, in which the Work is included in its entirety in unmodified form along with one or more other contributions, each constituting separate and independent works in themselves, which together are assembled into a collective whole. A work that constitutes a Collection will not be considered an Adaptation (as defined below) for the purposes of this License.

This allows SE to manage the rights to the collective work, including issuing takedown notices for content shared through SE even though they are not the owner of the individual works (the Subscriber Content).

[1] http://stackexchange.com/legal
[2] http://meta.stackoverflow.com/questions/209250/se-should-stop-using-the-cc-logo
[3] http://www.copyright.gov/fls/fl104.html
[4] http://www.copyright.gov/title17/92chap1.html#103
[5] http://creativecommons.org/licenses/by-sa/3.0/legalcode

Answer 11

I just found another site that seems to be ripping off StackExchange content:

http://earthwithsun.com/

There is no attribution to either StackExchange or the original posters. They had the nerve to slap a copyright notice on their pages, too. The domain is registered to somebody in Shanghai.