JustiaGate: The Cover-Up Continues
(Updates follow article)
On Friday October 21, 2011, this column exposed the scrubbing of Supreme Court Cases from legal research website Justia.com. On the following Monday October 24th, Justia founder and CEO Tim Stanley gave a very short response to Declan McCullagh at cnet.com about this scandal. (CNET is a tech heavy website dedicated to developers more so than the legal community.)
There Stanley asserted that citations in the 25 relevant cases (and more) were “mangled” due to a coding error. The code in question is called Regular Expressions, Regex for short. This code is essentially a filter. It is simple in that it will include or exclude specific characters from a result. A result would be what you see on an internet browser. Pure data is filtered through Regex code and put into its correct positions on a webpage in a template format.
The code error Stanley attributes the missing data to is a “ .* ” instead of a “ \s ”.
"In this case, Stanley said, what happened is that Justia's programmers typed in ".*" (which matches any character) when creating a regex. It's now an "\s" (which matches only spaces),". - Declan McCullagh
This column investigates Tim Stanley’s statements to cnet with regard to the plausibility of them by consulting a professional familiar with Regex. Dr. David Hansen PhD. is a current University Professor in Computer Science and he explains what those two bits of code do.
The “ .* “ means “match everything”, “ \s “ tells the code to “match white spaces” – the spaces between words for example. Dr. Hansen simplifies the concept for us:
“ "\s*" will only match a sequence of whitespace, stopping the match as soon as a non-whitespace character is found - so it's very specific and limited. The ".*" is exactly the opposite since it says match anything UNTIL the pattern that follows it is matched. So ".*foo" would match an entire file if "foo" were the last word in the file while "\s*foo" would only match foo and the little bit of whitespace ahead of it in the file. Point is that the patterns don't skip through the file, they match a section at a time,” - Dr. David Hansen PhD.
What Attorney Donofrio discovered in July 2011 and again in October 2011 were not blank pages or pages missing large generic chunks. Instead Donofrio found very specific things missing, including the following:
- 25 opinions where the case name “Minor v. Happersett” was removed.
- All 25 times the case name was deleted, the official citation was also taken out.
- Within those 25 cases, there are multiple cases where other key citizenship precedents related to the POTUS eligibility debate were also removed, such as The Slaughter-House Cases, In Re Lockwood, Scott v. Sanford, and Osborn v. Bank of the United States
- Additionally, some cases also witness the removal of full sentences from the Court’s opinion, such as U.S. v Wong Kim Ark and Pope v. Williams.
What is seen is a variety of words, names, sentences, and case numbers removed, which means they were deliberately chosen for removal. Regex code must have specific targets to look for to match precisely, down to the last period. If the code is to function predictably at all, it must be correct or the Regex code string it is in, would produce unpredictable results which is always to be avoided within the developer’s world. This lends significant credence to deliberate action being the culprit for the removal of such important citations in history.
“The effect would not have been as selective as what you’re identifying [the missing text and citations],” said Dr. Hansen in response to whether it was possible that an accidental placement of a “ .* ” could be responsible for the missing text, citations and case names.
Dr. Hansen stressed that code errors do happen, but more than likely such an error would cause problems on that page, or would have caused problems with other parts of the code which interact with the “broken” code phrase. Dr. Hansen also notes that this is something which would have been noticed and corrected relatively quickly. Yet in the 25 cases which cite Minor v Happersett, the missing text was gone for approximately 3 years.
The tampered cases had specific text removed, and only that text, leaving the rest. This pattern requires a deliberate effort to remove specific text and phrases while leaving the rest untouched. This again, leads to the conclusion these actions were deliberate.
“The bottom line is that the excuse, the plausibility that with a Regex, a “ .* ” could have been mistaken for a “ \s ”, that’s a reasonable thing. But, could you have a regex expression which is sophisticated enough and will fail in such a way that you would have these small exclusions in documents? I would say the odds against that are astronomical. And it would have required an absolutely unbelievably complex Regex that was insensitive to the replacement of the “ .* ” and the “ \s ”. So, can I say for certainty that that’s absolutely impossible? No. But I say the likelihood is so small it’s, it’s, if a student came to me with that excuse on their homework I would tell them you’re nuts.” - Dr. David Hansen PhD.
At his site, Donofrio points out that Tim Stanley stated in a previous interview from Jan. 2007, that his team did barely any programming to the Google Mini search engine and its results, further stating the simplicity of the set up and that "it just worked”:
"And for us, when we looked at some different alternatives, like doing some of our own programming, or using some of the other search technologies out there, the Google Mini, you know, from our standpoint was just a very simple to use easy solution. We could just install it, index all the data, pull back the data, change the style sheets a little bit, and it just worked. And so that was really one of the driving forces for us." - Tim Stanley via Ken Chan
Stanley indicates that there was nothing complex about the Justia setup, and the implementation team did not run into any problems integrating it. Furthermore, we know from the Wayback Machine snapshots that at the time Stanley gave this candid interview in 2007, none of the 25 cases citing to Minor v. Happersett had been corrupted. The cases only became "mangled" in the run up to the 2008 election.
Tim Stanley’s comments at CNET have been picked up by a few bloggers who state that the code errors Stanley speaks of could happen. In isolated cases this is true. Programmers do make mistakes, however, Regex is very brittle because it is so literal and so specific. One might be able to successfully make that argument if there had been just one instance of text being removed. This is not the case. 25 case names (that we know of) were subject to this treatment, along with full sentences over a broad spectrum of Supreme Court Cases.
With regard to Stanley's comments to CNET, Donofrio counters by illustrating that in multiple cases where the official citations were removed, new citations were added to the text that were not in the pre-corrupted versions:
"For example, the Nov. 4, 2006 version of Justia's publication of Colgate v. Harvey, 296 U.S. 404 (1935), finished with a final footnote which contains the case names, Minor v. Happersett, The Slaughterhouse Cases, and In Re Lockwood, as well as the official citations thereto.
But in the Nov. 18, 2008 version of Colgate v. Harvey published by Justia, all of those cases and their official citations are missing (along with a bunch of other cases). Additionally, in the Nov. 18, 2008 version, the very same footnote begins, "83 U.S. 73", which is a citation to a specific page in The Slaughterhouse-Cases. But that particular citation was not in the Nov. 4 2006 version, it's been newly added where the original citation (along with the case name), 16 Wall. 36, has been removed.”
“Therefore, Stanley's alleged innocent regex error had to have accomplished both the removal of data from the Court's opinion while at the same time inserting new data into the opinion.”
Donofrio also observes, "Tim Stanley's published comments at CNET do not address the addition of new data to the 25 cases identified as having been corrupted."
CNET's Senior Political Correspondent, Declan McCullagh began his report as follows:
"Donofrio...discovered that citations to a 1875 case defining a 'natural-born citizen'--a phrase that has special resonance in discussions about President Obama's eligibility for the office--had been quietly removed before the 2008 elections." - Declan McCullagh
The key word being "removed". Then McCullagh attributes the following to Tim Stanley:
"...some citations were mangled because of a programmer's error, not an effort to rewrite history." - CNET
Donofrio points out, "That statement only refers to 'citations' which already existed. It fails to address the insertion of new citations, missing case names, and the erasure of full sentences from opinions of the Court,".
McCullagh further discussed the removal of data, "The case in question, which Donofrio noticed had been removed from some citations, is Minor v. Happersett." Again, the key word here is "removed". Neither Stanley's comments, nor McCullagh's narrative address the new citations which were inserted into the altered versions.
Ultimately regardless of what code error is alleged by Stanley in the justification of removing SCOTUS text from the 25 cases which followed Minor V. Happersett, he has failed to address the insertion of new data. This anomaly was summed up simply by Dr. Hansen:
“If a regex was being used as some sort of filter or to help format output it wouldn't have added information to the later document that wasn't in the former…”
Stay Tuned, there’s more to come.
A sincere Hat-tip to Leo Donofrio, Esq. for his significant contributions to this article.
Leo Donofrio has published more on the JustiaGate story: JustiaGate: CEO Tim Stanley Admits Publishing Mangled Supreme Court Opinions - The Oyez Connection - SCOTUS Response -