Are there examples of originally widely accepted proofs that were later discovered to be wrong by attempting to formalize them using a proof assistant (e.g. Coq, Agda, Lean, Isabelle, HOL, Metamath, Mizar)?

17$\begingroup$ Proofs formalized by now are in hundreds, and the vast majority of them are elementary (from nineteenth century). So, I think what you described is very unlikely to happen. This may change dramatically in the future though. $\endgroup$– Alex GavrilovJan 21 '18 at 4:26

4$\begingroup$ related: cstheory.stackexchange.com/questions/37299/… $\endgroup$– Carlo BeenakkerJan 21 '18 at 8:04

45$\begingroup$ Proofs can be checked by formalization, but not shown to be wrong. At worst, the (human) formalizer can give up and admit not knowing how to construct the formalized proof. On the other hand, a proof can be shown to be wrong by giving a counter example (not just for the main result, but also for any part of the proof or subargument). This is in any case how proofs are scrutinized and does not require formalization. $\endgroup$– Igor KhavkineJan 21 '18 at 9:12

7$\begingroup$ The Coqcombi project proves the LittlewoodRichardson rule imitating the "Plactic Monoid" chapter of Lothaire's Algebraic Combinatorics. But the latter chapter is known to contain minor gaps and inaccuracies, so they have to have been fixed somehow in Coqcombi. Not sure if it counts... $\endgroup$– darij grinbergJan 21 '18 at 17:25

2$\begingroup$ @Igor Khavkine: How is giving a counterexample for a subargument of a proof different from demonstrating the proof wrong? And why do counterexamples need formalization any less than proofs  they might be inconsistent. $\endgroup$– Bernhard StadlerJan 22 '18 at 18:11
Since this question was asked in January there have been some developments. I would like to argue that the scenario raised in the question has now actually happened. Indeed, Sébastien Gouëzel, when formalising Vladimir Shchur's work on bounds for the Morse Lemma for Gromovhyperbolic spaces, found an actual inequality which was transposed at some point causing the proof (which had been published in 2013 in J. Funct. Anal., a good journal) to collapse. Gouëzel then worked with Shchur to find a new and correct (and in places far more complex) argument, which they wrote up as a joint paper.
http://www.math.sciences.univnantes.fr/~gouezel/articles/morse_lemma.pdf
The details are in the introduction. Anyone who reads it will see that this is not a "mistake" in the literature in the weak sense defined by Manuel Eberl's very clear answer  this was an actual error which was discovered because of a formalization process.

3$\begingroup$ I have marked this as the accepted answer now. For future reference, since this answer references the previous accepted answer, I'll add a note here to inform that the previous accepted answer was the one written by Manuel Eberl. $\endgroup$ Oct 26 '18 at 5:46

$\begingroup$ I edited my answer to remove the reference to "the accepted answer" and now reference Eberl's answer explicitly; hopefully this lessens the confusion even more. $\endgroup$ Oct 31 '18 at 11:51

1$\begingroup$ The GouëzelShchur paper has now been published in the Journal of Functional Analysis 277 (2019), 12581268 $\endgroup$ Jul 24 '19 at 19:51
First of all, to explain my perspective: I'm a PhD student working in the formalisation of mathematics with Isabelle/HOL, and I've been working with that system for about 7 years. I was introduced to it in a lecture when I was an undergraduate and I got hooked immediately. I do think it is useful, but I don't do it because of that. I do it because it's fun.
Anyway, your question is a bit tricky to answer because it depends a lot on what you mean by a ‘wrong proof’ and by ‘shown wrong by formalizing them’. A lot of the time, it's something of a grey area.
Normally, one needs a very thorough understanding of the paper proof in order to formalise it and one has to think of a way of formalising the argument. Conceptual problems with the paper proof often become apparent at this stage already, when there is no theorem prover involved as such yet.
Secondly, of course, if you formalise something like the Prime Number Theorem or Cauchy's Integral Theorem, you're probably not going to find out that it's all wrong and everything collapses. But you might well find problems in particular proofs in textbooks.
I do find a lot of ‘mistakes’ in proofs, including textbook proofs and published papers. Most of these mistakes are easily fixed and most mathematicians would likely brush them off as insignificant. Some take me a few days to figure out. Some actually require changing definitions, adding assumptions, or altering the statement of the theorem.
Most ‘mistakes’ are something like this:
surprisingly nontrivial arguments being declared as trivial/left to the reader
going very quickly and vaguely over part of the proof that is perceived as uninteresting and thereby missing a subtle flaw that would have become apparent if one had done it more thoroughly
missing cases that have probably been overlooked by the author
arithmetic mistakes (my favourite being multiplying an inequality with a constant and not checking that it is nonnegative)
missing assumptions that are still implicitly used
Let me give you a few examples (I won't mention the exact authors; my intent is not to shame anybody for making these mistakes, only to show that this is quite common):
I recently had a case where a theorem from a widelyused textbook from the 70s was simply plain wrong, which I realised when I wanted to find out how to formalise it. I am not an expert in that field, but apparently the people who do work in that field know that this is wrong.
One of the first algorithms (working on nondeterministic automata) that I formalised apparently assumed that the automaton is total (i.e. it has an outgoing transitions for every letter in the alphabet from every state). In my opinion, that should have absolutely been mentioned in the paper, but one could possibly argue that that was just implicit in their idea of an automaton.
A colleague of mine found a subtle problem with some complicated automata algorithm that had been used in stateoftheart software for years. It is still not known if and how this problem can be fixed.
In one instance, I had formalised a kind of program transformation technique from a published paper. The authors then extended that paper to a more detailed journal version and also added some new transformation rules. One of them dealt with multiplication with a constant, but they did not realise that multiplication with 0 is a special case that makes their rule unsound.
I worked on formalising a new result that had just been published in a journal and found out that one part of the proof that the authors didn't explain in much detail due to page limits had a subtle problem that became only apparent when I had already formalised everything in Isabelle and got stuck at this part. The authors immediately admitted that this was a problem that could not be fixed in any apparent way except by adding an additional, somewhat technical assumption to the entire argument. However, they later managed to prove a stronger result that subsumes that result, but the proof of that was much more involved. (more details on this at the end of this answer)
I don't remember the exact details about the Kepler conjecture that somebody mentioned before, but off the top of my head, I seem to recall that several smaller problems were found in the original program code, and Nipkow found one problem that actually caused Hales to revise a part of the proof completely.
As a theorem proving guy, my reaction to this is ‘This shows that formalising stuff in a theorem prover is worthwhile’. I am aware that mathematicians often have a different perspective. It is not an uncommon view to say that the socalled ‘mistakes’ I mentioned above are insignificant; that someone would have found them eventually even without a theorem prover; that the theorems were still correct (in some sense) and it was only the proofs that had some minor problems.
However, I disagree with that. I want my proofs to be as rigorous as possible. I want to be sure I didn't miss any assumptions. And I think that things like Kepler's conjecture show that there are instances where it is just infeasible for humans to check the correctness of a proof with a reasonable amount of certainty.
EDIT: As requested, some more details on point 5.
The paper in question is The impossibility of extending random dictatorship to weak preferences. They also published a corrigendum. The purpose of that paper is to show that no Social Decision Scheme (SDS) for at least 4 agents and alternatives exists that is an extension of Random Dictatorship (RD) and fulfils four nice properties.
It works by first showing that none exists for 4 agents and 4 alternatives and then shows that an SDS for more than 4 agents/alternatives can be turned into one for exactly 4/4 while preserving the nice properties, so that it cannot work for more than 4. Typically, in this kind of proof, the base case is the difficult one and lifting it to a larger number of agents/alternatives is pretty trivial. However, in this case, the property "the SDS is an extension of RD" does not survive the lifting to more agents/alternatives, which completely breaks that step. I myself only noticed that when I had already typed most of the proof into Isabelle and it just didn't go through.
The proof for the base case here was based on considering 12 particular preference profiles and, as you can see, relatively short. The authors then later found a proof for the same statement without the RD extension assumption, but that one needed 47 preference profiles and was much longer. I formalised that proof in Isabelle without any problems (see my BSc thesis).

10$\begingroup$ Excellent answer! Your example #5 is interesting enough that I wonder if you could write to the authors and ask if they mind having their identities revealed here. The purpose of course is not to shame them (we have all made mistakes) but to provide some concrete documentation about the value of formalization. From this point of view, #5 is much more interesting than #1 to #4 in my opinion (and of course #6 is already documented). $\endgroup$ Jan 24 '18 at 16:07

1$\begingroup$ +1. You should probably inform the authors about all of the errors you found, at least when the research is reasonably new. (Though #4 might be a matter of mismatched definitions of $\mathbb{N}$.) $\endgroup$ Jan 24 '18 at 16:36

5$\begingroup$ I usually do. Often, I never get any sort of reply, but in many cases the original authors are very helpful. #4 was something with real numbers and indeed a mistake. I think the authors were aware that 0 was a special case, but they thought it would still work out in the end. $\endgroup$ Jan 24 '18 at 16:41

7$\begingroup$ I updated my answer and included some more information about #5. $\endgroup$ Jan 25 '18 at 14:29

3$\begingroup$ Awesome answer, thanks for sharing your experience and knowledge. $\endgroup$ Jan 27 '18 at 13:57
This question was raised on the Foundations of Mathematics mailing list back in 2014, and the short answer is no, there are no examples of this. [EDIT: Although this may have been true at the time I originally wrote this, it is no longer true, as other answers amply demonstrate. But I think that this answer is worth leaving here nonetheless.]
The longer answer is that the process of formalizing any nontrivial mathematical argument is likely to reveal some minor gaps, such as degenerate cases for which the main argument doesn't quite work as stated. If you're sufficiently pedantic, then you might claim that in such cases, the original proof is "wrong," but I suspect that this is not the sort of thing you're asking for.
The Flyspeck project did turn up one gap in the original proof of the Kepler conjecture that was large enough that the authors felt the need to write a few pages of human explanation about it. There is also an interesting paper by Fleuriot and Paulson where they undertook to formalize Newton's Propositio Kepleriana with Isabelle using nonstandard analysis to implement Newton's use of infinitesimals. There was one step where Fleuriot and Paulson could not find a plausible way to imitate Newton's reasoning exactly and found themselves having to use a different argument. Again, it is debatable whether this means that Newton's proof was "wrong."

12$\begingroup$ @AsafKaragila : As we know from another MO question, there are cases where a wrong result is widely accepted for some time and even used by others, but so far we lack convincing examples where a proof assistant plays an essential role in the discovery of the error. This may change in the future if proof assistants become sufficiently easy to use (but by the time that day arrives, it may no longer be easy to decide whether to attribute the bug report to the human, or the proof assistant, or both working together). $\endgroup$ Jan 21 '18 at 20:12

4$\begingroup$ "If you're sufficiently pedantic, then you might claim that in such cases, the original proof is "wrong," but I suspect that this is not the sort of thing you're asking for." No, that's not being pedantic. That's just knowing what 'wrong' means. If a proof doesn't cover every case it claims to cover, then it's not a proof of what it claims to be. It's wrong. It might be a proof that nonX, nonY, nonZ somethings have property P, but it's not a proof that all somethings have property P. $\endgroup$ Jan 23 '18 at 21:12

8$\begingroup$ @MeiZhang : Thanks for clarifying. With this definition of "wrong proof" the Flyspeck example certainly counts. But let me point out that in this sense, most published proofs are wrong, or at least it's not clear whether they're wrong. E.g., it is common for parts of a proof to be "left to the reader." Is this a "gap"? If I pick up a random math journal not in my field, I will probably not understand any of the proofs. Are these "gaps"? Or just my lack of training? Human proofs always assume the reader will do a lot of work to fill in missing details. It's not clearcut what a real "gap" is. $\endgroup$ Jan 24 '18 at 2:36

5$\begingroup$ @MilesRout, you write: "No, that's not being pedantic. That's just knowing what 'wrong' means." I notice from your profile that you identfy yourself as a software developer, not a mathemician. With respect, you are wrong about the mathematician's notion of "wrong". The distinction between missing pedantic details and a genuinely wrong argument is central to our ability to make advances in mathematics  finding the crux of an argument and concentrating on it is how we know where to focus our energy to find (or check) a proof. $\endgroup$– HJRWJan 24 '18 at 9:25

5$\begingroup$ @MilesRout, to take a step back for a moment, your position here looks to a mathematician a lot like a powergrab on the part of the theoremproving community. You implied that proofs that require minor corrections to be formalized are "wrong, wrong, wrong". This suggested to me that if a mathematician proves a result, and a theoremprover makes some minor corrections when formalizing the result, then you think the mathematician was "wrong" and the theoremproved gave the correct proof. I hope the two communities can work together much better than that. $\endgroup$– HJRWJan 27 '18 at 12:50
Not an example that a result was shown wrong by using a proof assistant, but inconsistency was discovered in the premises of Gödel's ontological argument by a higherorder theorem prover: https://www.ijcai.org/Proceedings/16/Papers/137.pdf

3$\begingroup$ Very interesting! I had not heard about this development. $\endgroup$ Jan 21 '18 at 19:51

$\begingroup$ Yes, interesting indeed. Thanks for the reference. $\endgroup$ Jan 22 '18 at 16:29
I learned of the following stellar example from Lawrence Paulson.
Anthony Bordg, Yijun He, and Hanna Lachnitt have been involved in an ongoing effort to formalize quantum algorithms and results in quantum information theory in Isabelle/HOL. You can read about their efforts here.
In the course of their project, they naturally found themselves examining one of the seminal papers in the subject, Quantum games and quantum strategies, by J. Eisert, M. Wilkens, and M. Lewenstein. As of this writing, Google Scholar claims that this paper has nearly a thousand citations. Bordg, He, and Lachnitt found that there was a fundamental and unfixable error in one of the main results of the paper. They explain the details in an arXiv preprint.
Although it may also not fully count as an example discovered via a proof assistant involving fully formalized mathematics, the classification of maximal subgroups of all finite classical groups of dimension $\le 12$ [Bray, John N.; Holt, Derek F.; RoneyDougal, Colva M., The maximal subgroups of the lowdimensional finite classical groups., London Mathematical Society Lecture Note Series 407. Cambridge: Cambridge University Press (ISBN 9780521138604/pbk; 9781139192576/ebook). xiv, 438 p. (2013). ZBL1303.20053] involved large pieces of Magma code by which it was able to correct/fill gaps in the earlier theoretical results of [Kleidman, Peter; Liebeck, Martin, The subgroup structure of the finite classical groups, London Mathematical Society Lecture Note Series, 129. Cambridge etc.: Cambridge University Press. x (1990). ZBL0697.20004].