使用simian对比代码相似度

下载地址

需要环境:java 11+

下载完之后将压缩包解压,得到一个simian-x.jar文件。

命令行文档

Simian的命令行接口允许您从shell、shell脚本或批处理文件中运行它,扫描目录中所有匹配模式的文件。

Java版本的一般形式是:java -jar simian.jar [选项] [文件]

文件可以指定为任何常规的shell通配符或简单的文件列表,并且可以与选项混合使用。(请参见下面的示例。)

例如,要查找当前目录的所有子目录中的所有java文件:"**/*.java"

要查找当前目录中的所有java文件并将阈值设置为3:-threshold=3 "*.java"

要查找当前目录中的所有C#文件:"*.cs"

要查找当前目录的所有子目录中的所有C和header 文件:**/*.c **/*.h

要查找两个不同目录中的所有java文件:"/csharp-source/*.cs" "/java-source/*.java"

要查找所有子目录中的所有java文件,但排除测试类:-includes=**/*.java -excludes=**/*Test.java

要查找当前目录中的所有java文件并忽略数字:-ignoreNumbers "*.java"

要查找所有Ruby文件并以xml格式显示结果:-formatter=xml "*.rb"

要查找所有Ruby文件并将结果以emacs兼容格式发送到文件:-formatter=emacs:c:\temp\simian.log "*.rb"

要从文件中读取配置(文件的每一行最多指定一个有效的命令行参数):-config=simian.config

对比两个目录及所有子目录下的文件,操作示例:

java -jar simian-4.0.0.jar "dir1/**/*" "dir2/**/*"

Simian Similarity Analyzer 4.0.0 - https://simian.quandarypeak.com
Copyright (c) 2023 Quandary Peak Research. All rights reserved.
Subject to the Quandary Peak Academic Software License.
{failOnDuplication=true, ignoreCharacterCase=true, ignoreCurlyBraces=true, ignoreIdentifierCase=true, ignoreModifiers=true, ignoreStringCase=true, threshold=6}
Found 7 duplicate lines with fingerprint 3efdefb7042d75b240a724928264d72e in the following files:
 Between lines 158 and 165 in F:\Git\base-services\Docs\GitLab閮ㄧ讲.md
 Between lines 229 and 235 in F:\Git\base-services\Docs\GitLab閮ㄧ讲.md
Found 7 duplicate lines with fingerprint 3af0bfd4c0fbe2f94bd7a902e3ce4866 in the following files:
 Between lines 67 and 73 in F:\Git\base-services\Docs\GitLab閮ㄧ讲.md
 Between lines 228 and 234 in F:\Git\base-services\Docs\GitLab閮ㄧ讲.md
Found 8 duplicate lines with fingerprint f6282126511e2d39ec6e2cf70be5ba1f in the following files:
 Between lines 100 and 110 in F:\Git\base-services\Docs\SFTP鏈嶅姟閮ㄧ讲.md
 Between lines 83 and 93 in F:\base-services\Docs\SFTP鏈嶅姟閮ㄧ讲.md
Found 11 duplicate lines with fingerprint 5303d8ff14212e88a89e329597beddcf in the following files:
 Between lines 23 and 33 in F:\Git\base-services\Docs\GitLab閮ㄧ讲.md
 Between lines 114 and 124 in F:\Git\base-services\Docs\GitLab閮ㄧ讲.md
 Between lines 184 and 194 in F:\Git\base-services\Docs\GitLab閮ㄧ讲.md
Found 16 duplicate lines with fingerprint d5308a337a28d47e475feae75f012d6b in the following files:
 Between lines 73 and 92 in F:\Git\base-services\Docs\SFTP鏈嶅姟閮ㄧ讲.md
 Between lines 56 and 75 in F:\base-services\Docs\SFTP鏈嶅姟閮ㄧ讲.md
Found 18 duplicate lines with fingerprint 5e097ee9defe4601cb67ef451c15ea1d in the following files:
 Between lines 206 and 223 in F:\Git\base-services\Docs\GitLab閮ㄧ讲.md
 Between lines 136 and 153 in F:\Git\base-services\Docs\GitLab閮ㄧ讲.md
 Between lines 45 and 62 in F:\Git\base-services\Docs\GitLab閮ㄧ讲.md
Found 21 duplicate lines with fingerprint 0675756198f866614720c1d66b7678cf in the following files:
 Between lines 136 and 156 in F:\Git\base-services\Docs\GitLab閮ㄧ讲.md
 Between lines 206 and 226 in F:\Git\base-services\Docs\GitLab閮ㄧ讲.md
Found 27 duplicate lines with fingerprint 6f83c50446e403b047ce1b38c05c2564 in the following files:
 Between lines 1 and 33 in F:\base-services\Docs\鍩虹鏈嶅姟淇℃伅.md
 Between lines 1 and 33 in F:\Git\base-services\Docs\鍩虹鏈嶅姟淇℃伅.md
Found 43 duplicate lines with fingerprint 5405a4ef9ed6a725549837306f715e27 in the following files:
 Between lines 2 and 50 in F:\base-services\Docs\Zentao鏈嶅姟閮ㄧ讲.md
 Between lines 2 and 50 in F:\Git\base-services\Docs\Zentao鏈嶅姟閮ㄧ讲.md
Found 375 duplicate lines with fingerprint bb62d75f0ff1828a0f90422de27608ef in the following files:
 Between lines 1 and 448 in F:\base-services\Docs\GitLab鎿嶄綔.md
 Between lines 1 and 448 in F:\Git\base-services\Docs\GitLab鎿嶄綔.md
Found 1059 duplicate lines in 20 blocks in 9 files
Processed a total of 1382 significant (1662 raw) lines in 12 files
Processing time: 0.181sec

最后更新于