[Down­load Re­moveDu­pli­cates.py][dl]

[dl]: http://non­graph­i­cal.com/me­dia/uploads/Re­moveDu­pli­cates.py

One of the prob­lems with us­ing hy­brid Win­dows and Lin­ux en­vi­ron­ments is that one needs to watch close­ly for filesys­tem and file anoma­lies and in­con­sis­ten­cies. Dif­fer­ing end-of-line mark­ers, for ex­am­ple, cause many prob­lems when shar­ing files be­tween the two op­er­at­ing sys­tems. One par­tic­u­lar prob­lem I’ve run in­to is that of hav­ing du­pli­cate files, or in other words, mul­ti­ple files with the same file­name. This can hap­pen if, say, you copy a di­rec­to­ry some­where in Win­dows, then switch to Lin­ux and use a tool such as rsync to copy that same di­rec­to­ry over again. If the cap­i­tal­iza­tion is dif­fer­ent, Lin­ux will not re­place the old files, be­cause Lin­ux, un­like Win­dows, is case-sen­si­tive. This will even hap­pen, and is tech­ni­cal­ly ac­cept­able, on NTFS filesys­tems.

The so­lu­tion I’ve come up with is this sim­ple script, called Re­moveDu­pli­cates.py. Ob­vi­ous­ly, you need [Python][py] in­stalled to run it, but it has no ad­di­tion­al de­pen­den­cies. Sim­ply run it *in the di­rec­to­ry you wish to clean*, and it should do the rest. Note that you shouldn’t use this for en­tire filesys­tems (yet), be­cause it will use ridicu­lous amounts of mem­o­ry if it is given a high num­ber of files. [Down­load it here][dl]!

[py]: http://www.python.org/

P.S. Al­so, I can­not guar­an­tee that this tool will work as in­tend­ed or will be bug-free. Use wise­ly.