Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 404 Vote(s) - 3.5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Wordpress/Apache - 404 error with unicode characters in image filenames

#1
We've recently moved a website to a new server, and are running into an odd issue where some uploaded images with unicode characters in the filename are giving us a 404 error.

Via ssh/FTP, we can see that the files are definitely there.

For example:

[To see links please register here]


none of the images are working:

Code:

<img class='image-display' title='' src='http://sjofasting.no/wp/wp-content/uploads/2012/03/ådnøy_1_2.jpg' width='685' height='484'/>

SSH:

> -rw-r--r-- 1 xxxxxxxx xxxxxxxx 836813 Aug 3 16:12 ådnøy_1_2.jpg

What is also strange is that if you navigate to the directory you can even click on the image and it works:

[To see links please register here]


click on 'ådnøy_1_2.jpg' and it works.

Somehow wordpress is generating

>

[To see links please register here]

ådnøy_1_2.jpg

and copying from the direct folder browse is generating

>

[To see links please register here]


What is going on??

---
edit:

If I copy the image url from the wordpress source I get:

[To see links please register here]


When copied from the apache browser I get:

[To see links please register here]


What could account for this discrepancy between:
%C3%A5 and %cc%8

??
Reply

#2
Unicode normalisation.

`0xC3` `0xA5` is the UTF-8 encoding for U+00E5 a-with-ring.

`0xCC` `0x8A` is the UTF-8 encoding for U+030A combining ring.

U+0035 is the composed (Normal Form C) way of writing an a-ring; an `a` letter followed by U+030A is the decomposed (Normal Form D) way of writing it. `å` vs `å` - they should look the same, though they may differ slightly depending on font rendering.

Now normally it doesn't really matter which one you've got because sensible filesystems leave them untouched. If you save a file called `[char U+00E5].txt` (`å.txt`), it stays called that under Windows and Linux.

Macs, on the other hand, are insane. The filesystem prefers Normal Form D, to the extent that any composed characters you pass into it get converted into decomposed ones. If you put a file in called `[char U+00E5].txt` and immediately list the directory, you'll find you've actually got a file called `a[char U+030A].txt`. You *can* still access the file as `[char U+00E5].txt` on a Mac because it'll convert that input into Normal Form D too before looking it up, but you *cannot* recover the same filename in character sequence terms as you put in: it's a lossy conversion.

So if you save your files on a Mac and then transfer to a filesystem where `[char U+00E5].txt` and `a[char U+030A].txt` refer to different files, you will get broken links.

Update the pages to point to the Normal Form D versions of the URLs, or re-upload the files from a filesystem that doesn't egregiously mangle Unicode characters.

Think Different, Cause Bizarre Interoperability Problems.
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through