Files
Nick K f7c115da4e Improving the documentation
- updated .gitignore to not ignore build.properties
- added missing one-jar build to slickGitLog
- added build.properties
- updated readme
- updated .gitignore
- updated readme
- updated readme

Author:    Nick K <nick.karamanian@gmail.com>
Date:      Fri May 10 12:47:28 2019 -0700
2019-05-10 15:10:46 -07:00
..
2019-05-10 15:10:46 -07:00
2017-07-14 02:07:05 -07:00

Unifier of identities of committers and authors of a git repo

This program extracts all the committers and authors of commits of a git repo and unifies their identities.

The output of this program are both a spreedsheet and a table (in a sql database).

How to run

java authors.jar <repodir> <spreadsheet-file> <sqlite-file>

  • Warning: it will overwrite both files

SQLite

Schema

The table is called authors and has the following schema:

CREATE TABLE authors(
    recordid int PRIMARY KEY, 
    author text, 
    personid text, 
    email text, 
    name text, 
    lcemail text, 
    userid text, 
    domain tex, 
    autcount int,
    comcount int,
    dateadded text, 
    checked text, 
    notes text);

Explanation of the fields.

name explanation notes
recordid An integer, unique across all records
author full email as used in git it should be renamed
personid unique identifier of this record this id is unique per individual
email email parl of the full email
name name part of the full email
lcemail lowercase version of the email useful for reviewing of records
userid lowercase user part of the email
domain lowercase domain part of the email
autcount number of commits this record authored
comcount number of commits this record committed
dateadded date in which the record was added to the db
checked true if this record has been manually checked
notes

License

  • GPL-3+

Dependencies

jgit https://eclipse.org/jgit/ BSD3
poi scala https://github.com/folone/poi.scala Apache-2.0
apache poi https://poi.apache.org/ Apache-2.0
slick http://slick.lightbend.com/ MIT
sqlite-jdbc Apache-2.0
HikariCP Apache-2.0

TODO

The following are issues that need to be addressed.

  • It need to take into consideration the .mailmap file
  • It needs to Support incrementally adding authors to the database. The idea behind is that once names have been vetted, those names will not be matched again.
  • Ask if the files should be overwritten