The corruption in this question is isolated to a single substring at the end of the serialized string with was probably manually replaced by someone who lazily wanted to update the `image` filename. This fact will be apparent in my demonstration link below using the OP's posted data -- in short, `C:fakepath100.jpg` does not have a length of `19`, it should be `17`.
Since the serialized string corruption is limited to an incorrect byte/character count number, the following will do a fine job of updating the corrupted string with the correct byte count value.
The following regex based replacement will only be effective in remedying byte counts, nothing more.
------------------------------------------------------------------------
It looks like many of the earlier posts are just copy-pasting a regex pattern from someone else. **There is no reason to capture the potentially corrupted byte count number if it isn't going to be used in the replacement.** Also, adding the `s` pattern modifier is a reasonable inclusion in case a string value contains newlines/line returns.
*For those that are not aware of the treatment of multibyte characters with serializing, you **must not use `mb_strlen()` in the custom callback because it is the byte count that is stored not the character count**, see my output...
Code: ([Demo with OP's data][1]) ([Demo with arbitrary sample data][2]) ([Demo with condition replacing][3])
$corrupted = <<<STRING
a:4:{i:0;s:3:"three";i:1;s:5:"five";i:2;s:2:"newline1
newline2";i:3;s:6:"garçon";}
STRING;
$repaired = preg_replace_callback(
'/s:\d+:"(.*?)";/s',
// ^^^- matched/consumed but not captured because not used in replacement
function ($m) {
return "s:" . strlen($m[1]) . ":\"{$m[1]}\";";
},
$corrupted
);
echo $corrupted , "\n" , $repaired;
echo "\n---\n";
var_export(unserialize($repaired));
Output:
a:4:{i:0;s:3:"three";i:1;s:5:"five";i:2;s:2:"newline1
Newline2";i:3;s:6:"garçon";}
a:4:{i:0;s:5:"three";i:1;s:4:"five";i:2;s:17:"newline1
Newline2";i:3;s:7:"garçon";}
---
array (
0 => 'three',
1 => 'five',
2 => 'newline1
Newline2',
3 => 'garçon',
)
---
**One leg down the rabbit hole...** The above works fine even if double quotes occur in a string value, but if a string value contains `";` or some other monkeywrenching sbustring, you'll need to go a little further and implement "lookarounds". My new pattern
checks that the leading `s` is:
- the start of the entire input string or
- preceded by `;`
and checks that the `";` is:
- at the end of the entire input string or
- followed by `}` or
- followed by a string or integer declaration `s:` or `i:`
I haven't test each and every possibility; in fact, I am relatively unfamiliar with all of the possibilities in a serialized string because I never elect to work with serialized data -- always json in modern applications. If there are additional possible leading or trailing characters, leave a comment and I'll extend the lookarounds.
Extended snippet: ([Demo][4])
$corrupted_byte_counts = <<<STRING
a:12:{i:0;s:3:"three";i:1;s:5:"five";i:2;s:2:"newline1
newline2";i:3;s:6:"garçon";i:4;s:111:"double " quote \"escaped";i:5;s:1:"a,comma";i:6;s:9:"a:colon";i:7;s:0:"single 'quote";i:8;s:999:"semi;colon";s:5:"assoc";s:3:"yes";i:9;s:1:"monkey";wrenching doublequote-semicolon";s:3:"s:";s:9:"val s: val";}
STRING;
$repaired = preg_replace_callback(
'/(?<=^|;)s:\d+:"(.*?)";(?=$|}|[si]:)/s',
//^^^^^^^^--------------^^^^^^^^^^^^^-- some additional validation
function ($m) {
return 's:' . strlen($m[1]) . ":\"{$m[1]}\";";
},
$corrupted_byte_counts
);
echo "corrupted serialized array:\n$corrupted_byte_counts";
echo "\n---\n";
echo "repaired serialized array:\n$repaired";
echo "\n---\n";
print_r(unserialize($repaired));
Output:
corrupted serialized array:
a:12:{i:0;s:3:"three";i:1;s:5:"five";i:2;s:2:"newline1
newline2";i:3;s:6:"garçon";i:4;s:111:"double " quote \"escaped";i:5;s:1:"a,comma";i:6;s:9:"a:colon";i:7;s:0:"single 'quote";i:8;s:999:"semi;colon";s:5:"assoc";s:3:"yes";i:9;s:1:"monkey";wrenching doublequote-semicolon";s:3:"s:";s:9:"val s: val";}
---
repaired serialized array:
a:12:{i:0;s:5:"three";i:1;s:4:"five";i:2;s:17:"newline1
newline2";i:3;s:7:"garçon";i:4;s:24:"double " quote \"escaped";i:5;s:7:"a,comma";i:6;s:7:"a:colon";i:7;s:13:"single 'quote";i:8;s:10:"semi;colon";s:5:"assoc";s:3:"yes";i:9;s:39:"monkey";wrenching doublequote-semicolon";s:2:"s:";s:10:"val s: val";}
---
Array
(
[0] => three
[1] => five
[2] => newline1
newline2
[3] => garçon
[4] => double " quote \"escaped
[5] => a,comma
[6] => a:colon
[7] => single 'quote
[8] => semi;colon
[assoc] => yes
[9] => monkey";wrenching doublequote-semicolon
[s:] => val s: val
)
[1]:
[To see links please register here]
[2]:
[To see links please register here]
[3]:
[To see links please register here]
[4]:
[To see links please register here]