Github搜索

总结来说:
secrets are committed often, and are discoverable very quickly, likely before the affected parties have time to react. Attackers can, and have, used similar techniques to identify secrets and use them for malicious purposes.

虽然这些github数据集历史数据对于衡量问题的规模或确定一段时间内的趋势非常有用,但是大多数组织来说,对于未来如何监视或防止新的secrets泄露更加感兴趣。

优秀的思路参考

https://lightless.me/archives/How-To-Designing-A-Faster-Than-Faster-GitHub-Monitoring-System.html
其代码实现:
https://github.com/lightless233/geye
可以对照这个项目参考:
https://github.com/VKSRC/Github-Monitor

如何防止

如果你对开发环境可控

因为git有一个hook脚本的功能,可以在commit之前进行一些检查,如果你对开发的git环境完全可控,完全可以进行hook,在敏感信息commit之前就拦截下来。

如果你对git仓库完全可控

由于github支持webhook,你可以监控一些commit/push事件,然后进行hook,一旦发生,就进行扫描行为。
https://developer.github.com/webhooks/

如果完全移除某个repo:
https://help.github.com/en/articles/removing-sensitive-data-from-a-repository

参考

https://duo.com/labs/research/how-to-monitor-github-for-secrets
https://www.ndss-symposium.org/ndss-paper/how-bad-can-it-git-characterizing-secret-leakage-in-public-github-repositories/

Github数据集

https://www.gharchive.org/
Github自己上传的:
https://console.cloud.google.com/marketplace/details/github/github-repos?filter=solution-type:dataset&id=46ee22ab-2ca4-4750-81a7-3ee0f0150dcb

细节

github敏感信息泄露研究:
https://www.ndss-symposium.org/ndss-paper/how-bad-can-it-git-characterizing-secret-leakage-in-public-github-repositories/
paper:
https://www.ndss-symposium.org/wp-content/uploads/2019/02/ndss2019_04B-3_Meli_paper.pdf
video:
https://www.youtube.com/watch?v=N-pg_47s5Ok&index=4&t=1s&list=PLfUWWM-POgQtjEA_FIN7s0XFWoRdW4lil

github数据集:
https://console.cloud.google.com/marketplace/details/github/github-repos?filter=solution-type:dataset&q=github&id=46ee22ab-2ca4-4750-81a7-3ee0f0150dcb&pli=1

查询查看界面:
https://console.cloud.google.com/bigquery?organizationId=&angularJsUrl=%2Fbigquery%3Fp%3Dbigquery-public-data%26d%3Dgithub_repos%26page%3Ddataset%26organizationId%3D%26creatingProject%3Dtrue%26angularJsUrl%3D%252Fbigquery%253Fp%253Dbigquery-public-data%2526d%253Dgithub_repos%2526page%253Ddataset%2526supportedpurview%253Dproject%2526organizationId%253D0%2526creatingProject%253Dtrue%26project%3Dfestive-idea-254209%26folder%3D%26supportedpurview%3Dproject&project=festive-idea-254209&folder=&supportedpurview=project&p=bigquery-public-data&d=github_repos&t=files&page=table

Github监控项目汇总

Audit git repos for secrets
使用Django的一套github监控框架

git-secrets脚本:
https://github.com/awslabs/git-secrets/blob/3958dacceeebeab84e2a3c686c00fb9bde17cb55/git-secrets

匹配正则汇总

https://github.com/zricethezav/gitleaks/blob/065b6216049d71e7f3c28dec3f4e93a24b304033/gitleaks.toml
https://github.com/michenriksen/gitrob/blob/7be4c5306a61383a3ba16777b520b3c2a8956a1e/core/signatures.go
https://github.com/dxa4481/truffleHog/blob/0d6f2dfea5f9e9b196414f3925b988e1ba62880f/scripts/searchOrg.py
https://github.com/eth0izzle/shhgit/blob/f9b4febcd6ec6c1d509b28efbad6dc1ca9d17837/config.yaml
https://github.com/BishopFox/GitGot/blob/3a754dfcf66707a68d7507aabb5cf44d48f5e924/checks/default.list

附录

代码搜索语法:
https://help.github.com/en/articles/searching-code

You must be signed in to search for code across all public repositories.

用户只能在登录状态下,搜索整个github的仓库的代码

Code in forks is only searchable if the fork has more stars than the parent repository. Forks with fewer stars than the parent repository are not indexed for code search.

forks中的代码,只有中fork的仓库的star比原仓库多时,才会被索引,否则不会。

Only the default branch is indexed for code search. In most cases, this will be the master branch.

默认只会搜索默认(通常为master)分支中的代码。

Only files smaller than 384 KB are searchable.

只有小于384 KB大小的文件才能被搜索。

Only repositories with fewer than 500,000 files are searchable.

只有文件个数少于500,000的仓库才能被搜索。

You can’t use the following wildcard characters as part of your search query: . , : ; / \ ` ’ " = * ! ? # $ & + ^ | ~ < > ( ) { } [ ]. The search will simply ignore these symbols.

以下特殊符号不能作为搜索关键词。

笔记

由于github不支持正则在线匹配,所以只能搜索特定关键词,然后离线匹配,这篇文章里说的也是这种方式:
Github搜索_第1张图片
https://www.ndss-symposium.org/wp-content/uploads/2019/02/ndss2019_04B-3_Meli_paper.pdf

你可能感兴趣的:(安全)