Removing duplicate files

Anonymous
Anonymous's picture

Have been able to produce a file containing all the filenames in a path, data such as

Folder Compare
Produced: 11/01/10 05:07:56 PM

Mode: All
Base folder: ~/.kde/share/apps/kmail/mail
Name Size CRC Name Size CRC
------------------------------------------------------------------------------------------------
1232364633.13472.s7ZYS:2,S 19,370 0003C905 >>
1233650469.8190.4bQ5o:2,S 974 0005586C >>
1233650255.8190.qAdNh:2,S 4,104 000835C0 >>
1233650291.8190.5f5ud:2,S 1,275 000A3AD5 >>
1233650301.8190.TCuxJ 2,308 000B6FA3 >>

Because the report that produced the file was sorted to CRC, I'd like to simply open the file, read the contents, line by line, and where the CRC is the same as the previous line, then add details to an array. Then after completing to read the file/report, either display the array, or make code from it, and write to a bash file (like 'rm filename' commands).

Here is the php code so far

<?php
$handle
= @fopen("~/Documents/temp/Report.txt", "r");
if (
$handle) {
    while (!
feof($handle)) {
       
$buffer = fgets($handle, 4096);
        echo
$buffer;
   
$pieces = explode(" ", $buffer);
   
print_r($pieces);
    }
   
fclose($handle);
}
?>

The echo and print_r are just to see how the data looks to php. Now, the explode does this

Array
(
[0] => 1233650161.8190.VtijH
[1] =>
[2] =>
[3] =>
[4] =>
[5] =>
[6] =>
[7] =>
[8] =>
[9] =>
[10] =>
[11] =>
[12] =>
[13] =>
[14] =>
[15] =>
[16] =>
[17] =>
[18] =>
[19] =>
[20] =>
[21] =>
[22] =>
[23] =>
[24] =>
[25] =>
[26] =>
[27] =>
[28] =>
[29] =>
[30] =>
[31] =>
[32] =>
[33] =>
[34] =>
[35] =>
[36] =>
[37] =>
[38] =>
[39] =>
[40] => 404,280
[41] =>
[42] =>
[43] => 00F267FC
[44] => >>

The filename is always in element/key zero it seems, but the size and CRC are in different array keys, depends on the data.

How can I do the split (or similar) just to get the filename, size and CRC ? I also want to bypass the first 7 lines, and also bypass lines like this

.Templates.index

Once I am able to get the filename, size and CRC, it can be simply stored in variables, and then compare the values from the next line read in the file.

Thanks,

J

Joined: 11/28/2008
User offline. Last seen 1 year 48 weeks ago.
I just skimmed over your post

I just skimmed over your post and if I can quickly guess your question, it sounds like you have data randomly sitting in an array that you want to extract. I would check out this function:

http://us2.php.net/manual/en/function.preg-grep.php

Basically you can perform an expression on each element of the array, and it will return the array element that has that match.

Lemme know if that helped at all.

WARNING: No exposure to the Son will cause burning!

Portfolio Site | Production Blog |