Binary diffs for large files

Completed Posted Jun 7, 2006 Paid on delivery
Completed Paid on delivery

A "diff" is a file that describes the differences between two other files. Where you have an original copy of a file, and then a newer version of the file, and you want to be able to store the difference between the two. And then you can use this "diff" file, along with the original file, to recreate the newer version of the file. This allows you to keep several versions of arbitrary files in much less space by keeping only the original and subsequent diffs. I need a version that will work on large binary files from 200MB to larger than 8GB in a reasonable amount of time. Current programs (see bindiff at [url removed, login to view]) take an exhaustive approach, to match sequences as small as 6 bytes, that use way to much CPU time and RAM to be practical for large files. The large files I need to diff can be expected to have large contiguous chunks that are the same as the original file, but simply in a different location in the file. This means that a number of shortcuts can be taken. A more detailed discussion of what I am looking for is available here: [url removed, login to view]

## Deliverables

1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.

2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):

a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.

b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.

3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).

4) Program source will be available licensed however is necessary, so use of open source components is allowed.

5) Program must be accessible, along with all options, through the Microsoft Windows XP/2003 command line.

6) Program must function with files from Windows network shares. (ex: \\server\sharename\[url removed, login to view])

7) Variables to set must include input file 1, input file 2, output file, chunk size, maximum percentage of missing chunks for matching-byte sequence, matching byte, denied matching bytes, diff creation/use.

8) Program must be able to process an 8GB file on an Athlon X2 3800+ CPU (or equivalent) with 2GB of RAM within 1 hour. 9) A text log off all actions must be recorded for every action that includes:

a) Command line used

b) Size of input and output files

c) Start time

d) Time to run

10) The program must provide a method to use the original file and the diff to recreate the updated file.

11) The recreated file must prove to be 100% correct through hashing. 12) Programming language must be C, C++, and/or x86 assembly.

## Platform

Windows XP 32 bit x86 Windows Server 2003 Windows command line

Engineering Microsoft MySQL PHP Software Architecture Software Testing Windows Desktop

Project ID: #3558146

About the project

5 proposals Remote project Active Jun 15, 2006

Awarded to:

HaroldHardy

See private message.

$85 USD in 7 days
(86 Reviews)
5.8

5 freelancers are bidding on average $85 for this job

keathanderson

See private message.

$85 USD in 7 days
(25 Reviews)
4.5
softwareprabhu

See private message.

$85 USD in 7 days
(15 Reviews)
4.5
kitejohn

See private message.

$85 USD in 7 days
(0 Reviews)
0.0
farport

See private message.

$85 USD in 7 days
(0 Reviews)
0.0