Features

Simian version runs under any Java2 1.4 or higher Java Virtual Machine (JVM) and any Dot Net 1.1 or higher environment, meaning Simian can be run on anything from windows, macOS and linux to zOS.

The distribution contains everything you need to be up and running in minutes:

Aslak Hellesoy has kindly donated a Maven plugin.

Neil Bartlett has kindly donated an Eclipse plugin.

Simian fully supports the following languges:

with partial support for the following languages:

If the file is not of a supported type, it is treated as plain text. This means that you can usually run Simian on just about any type of human-readable file with good results.

Ignores whitespace, curly braces, comments, imports, includes, package declarations, etc.

Supports the following processing options:

optionlanguagesdefaultpossible valuesdescription
formatterallnoneplain, xml, emacs, vs (visual studio), yamlSpecifies the format in which processing results will be produced.
thresholdall6 integer >= 2Matches will contain at least the specified number of lines.
languagen/anonejava, c#, cs, csharp, c, c++, cpp, cplusplus, js, javascript, cobol, abap, rb, ruby, vb, jsp, html, xmlAssumes all files are in the specified language
failOnDuplicationalltruebooleanCauses the checker to fail the current process if duplication is detected
reportDuplicateTextallfalsebooleanPrints the duplicate text in reports
ignoreBlocksallnonestringIgnores all lines between specified START/END markers
ignoreCurlyBracesJava, C#, C, C++, JavaScript, RubyfalsebooleanCurly braces are ignored.
ignoreIdentifiersJava, C#, C, C++, JavaScript, COBOL, RubyfalsebooleanCompletely ignores all identfiers.
ignoreIdentifierCaseJava, C#, C, C++, JavaScript, COBOL, RubytruebooleanMatches identifiers irrespective of case. Eg. MyVariableName and myvariablename would both match.
ignoreRegions C#falsebooleanIgnore lines between #region/#endregion.
ignoreStringsJava, C#, C, C++, JavaScript, COBOL, Ruby, SQLfalsebooleanMyVariable and myvariablewould both match.
ignoreStringCaseJava, C#, C, C++, JavaScript, COBOL, Ruby, SQLtrueboolean"Hello, World" and "HELLO, WORLD" would both match.
ignoreNumbersJava, C#, C, C++, JavaScript, COBOL, Ruby, SQLfalsebooleanint x = 1; and int x = 576; would both match.
ignoreCharactersJava, C#, C, C++, JavaScript, COBOL, Rubyfalseboolean'A' and 'Z'would both match.
ignoreCharacterCaseJava, C#, C, C++, JavaScript, COBOL, Rubytrueboolean'A' and 'a'would both match.
ignoreLiteralsJava, C#, C, C++, JavaScript, COBOL, Ruby, SQLfalseboolean'A', "one" and 27.8would all match.
ignoreSubtypeNamesJava, C false booleanBufferedReader, StringReader and Reader would all match.
ignoreModifiersJava, C#, C, C++, JavaScript truebooleanpublic, protected, static, etc.
ignoreVariableNamesJava, C falsebooleanCompletely ignores variable names (field, parameter and local). Eg. int foo = 1; and int bar = 1 would both match
balanceParenthesesJava, C#, C, C++, JavaScript, COBOL, Ruby, SQLfalsebooleanEnsures that expressions inside parenthesis that are split across multiple physical lines are considered as one.
balanceCurlyBracesRuby falsebooleanEnsures that expressions inside curly braces that are split across multiple physical lines are considered as one.
balanceSquareBracketsJava, C#, C, C++, JavaScript, RubyfalsebooleanEnsures that expressions inside square brackets that are split across multiple physical lines are considered as one. Defaults to false.

Recognises the following file extensions/language options:

languageextensions
javajava
c sharpcs, c#, csharp
cc, h, m
cppcpp, c++, hpp, cplusplus
rubyrb, ruby
cobolcobol
abapabap
xmlxml, xsl, xsd
jspjsp
aspasp
javascriptjs, javascript
htmlhtml, htm
vbvb, bas, cls, frm
lisplisp, lsp
textthis is the default when no appropriate language can be determined

Sample Output

Here is an example of the standard output produced by Simian (version 2.0.3) when run against the JDK 1.4.2_03 source code:

Similarity Analyser 2.1.2 - http://www.redhillconsulting.com.au/products/simian/index.html
Copyright (c) 2003-04 RedHill Consulting, Pty. Ltd.  All rights reserved.
Simian is not free unless used solely for non-commercial or evaluation purposes.
{ignoreCurlyBraces=true, ignoreModifiers=true, ignoreStringCase=true, threshold=9}
Loading (recursively) *.java from /var/tmp/jdksrc
Found 9 duplicate lines in the following files:
 Between lines 65 and 76 in /var/tmp/jdksrc/javax/swing/plaf/basic/BasicSliderUI.java
 Between lines 71 and 82 in /var/tmp/jdksrc/com/sun/java/swing/plaf/gtk/SynthSliderUI.java
Found 9 duplicate lines in the following files:
 Between lines 37 and 49 in /var/tmp/jdksrc/com/sun/java/swing/plaf/motif/MotifCheckBoxMenuItemUI.java
 Between lines 43 and 55 in /var/tmp/jdksrc/com/sun/java/swing/plaf/motif/MotifRadioButtonMenuItemUI.java
 Between lines 36 and 48 in /var/tmp/jdksrc/com/sun/java/swing/plaf/motif/MotifMenuItemUI.java
Found 9 duplicate lines in the following files:
 Between lines 391 and 435 in /var/tmp/jdksrc/org/apache/xml/dtm/ref/DTMDocumentImpl.java
 Between lines 1533 and 1577 in /var/tmp/jdksrc/org/apache/xml/dtm/ref/dom2dtm/DOM2DTM.java
Found 9 duplicate lines in the following files:
 Between lines 1744 and 1758 in /var/tmp/jdksrc/javax/swing/plaf/metal/MetalFileChooserUI.java
 Between lines 1995 and 2009 in /var/tmp/jdksrc/com/sun/java/swing/plaf/windows/WindowsFileChooserUI.java
 Between lines 849 and 863 in /var/tmp/jdksrc/com/sun/java/swing/plaf/gtk/GTKFileChooserUI.java
Found 9 duplicate lines in the following files:
 Between lines 47 and 59 in /var/tmp/jdksrc/javax/swing/plaf/basic/BasicMenuBarUI.java
 Between lines 55 and 67 in /var/tmp/jdksrc/com/sun/java/swing/plaf/gtk/SynthMenuBarUI.java
...
Found 285 duplicate lines in the following files:
 Between lines 42 and 599 in /var/tmp/jdksrc/javax/swing/plaf/basic/BasicTableUI.java
 Between lines 43 and 600 in /var/tmp/jdksrc/com/sun/java/swing/plaf/gtk/SynthTableUI.java
Found 285 duplicate lines in the following files:
 Between lines 471 and 1123 in /var/tmp/jdksrc/javax/swing/plaf/basic/BasicComboPopup.java
 Between lines 468 and 1120 in /var/tmp/jdksrc/com/sun/java/swing/plaf/gtk/SynthComboPopup.java
Found 334 duplicate lines in the following files:
 Between lines 1950 and 2461 in /var/tmp/jdksrc/com/sun/java/swing/plaf/gtk/SynthTabbedPaneUI.java
 Between lines 2199 and 2710 in /var/tmp/jdksrc/javax/swing/plaf/basic/BasicTabbedPaneUI.java
Found 384 duplicate lines in the following files:
 Between lines 739 and 1660 in /var/tmp/jdksrc/com/sun/java/swing/plaf/gtk/SynthListUI.java
 Between lines 710 and 1631 in /var/tmp/jdksrc/javax/swing/plaf/basic/BasicListUI.java
Found 435 duplicate lines in the following files:
 Between lines 84 and 545 in /var/tmp/jdksrc/org/apache/xalan/res/XSLTErrorResources_ko.java
 Between lines 121 and 579 in /var/tmp/jdksrc/org/apache/xalan/res/XSLTErrorResources.java
Found 68412 duplicate lines in 3143 blocks in 953 files
Processed a total of 414712 significant (1295861 raw) lines in 4136 files
Processing time: 24.916sec

To see the full results* for the JDK 1.4.2_03 source code, download either the 350k plain text or a 43k compressed version.

* Results may vary depending on factors such as hardware used, number of duplicate lines, etc.


Java and all Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.

.NET and all .NET-based marks are trademarks or registered trademarks of Microsoft® in the United States and other countries.

Copyright (c) 2003-07 RedHill Consulting Pty. Ltd. All rights reserved.