Comment on Deleted GitHub data is forever accessible to anyone, researchers claim | Cybernews
Morphit@feddit.uk 3 months agoI think Github keeps all the commits of forks in a single pool. So if someone commits a secret to one fork, that commit could be looked up in any of them, even if the one that was committed to was private/is deleted/no references exist to the commit.
The big issue is discovery. If no-one has pulled the leaky commit onto a fork, then the only way to access it is to guess the commit hash. Github makes this easier for you:
What’s more, Ayrey explained, you don’t even need the full identifying hash to access the commit. “If you know the first four characters of the identifier, GitHub will almost auto-complete the rest of the identifier for you,” he said, noting that with just sixty-five thousand possible combinations for those characters, that’s a small enough number to test all the possibilities.
I think all GitHub should do is prune orphaned commits from the auto-suggestion list. If someone grabbed the complete commit ID then they probably grabbed the content already anyway.
Dave@lemmy.nz 3 months ago
Thanks, I think that explains it a bit more. It is unexpected to me, as a non-git expert, and I’m sure many others.
Morphit@feddit.uk 3 months ago
I guess the funny thing is that each Git commit is internally just a file. Branches and tags are just links to specific commit files and of course commits link to their parents. If a branch gets deleted or jumped back to a previous commit, the orphaned commits are still left in the filesystem. Various Git actions can trigger a garbage collection, but unless you generate huge diffs, they usually stick around for a really long time. Determining if a commit is orphaned is work that Git usually doesn’t bother doing. There’s also a reflog that can let you recover lost commits if you make a mistake.