Talk:dodo

From twext

Jump to: navigation, search

Contents

[edit] talk dodo

porfiz, gracias!

[edit] rolando

About data in the filename, the idea has been debated for about 30 years and unfortunately it all ended up with the so-called 'extensions' everybody use for files (.jpg .html .txt).

ok so for english-only has data in filename been not just debated but tried? it'd be great to learn from examples.. ie google "database in filename" system"

[edit] data in filename

Maybe i didn't explain it right. The domains are not a problem at all, so no need for IDN.

dodo needs idn because it wants meaningful urls
can domains act like file system? all dodo wants is easy data

The problem is with the names of the files themselves.

see below

,The character set was never standardized in all operating systems (Windows, for instance, is a huge problem because it's carrying a very big baggage of legacy code and support for long dead filesystems, more about it at the end), so it's very difficult to guarantee that the name of a file can be used transparently without problems in the web or in all operating environments.

Also, there are arbitrary limits for the length of the filenames in every system, which also relates to a problem with the number of directories you can nest and with the total length of the pathname. Some systems (notably Windows) have a hard time doing a good job with letter capitalization, even if it's using the english ASCII character set only, because they never supported it natively or don't care about case sensitivity and thus make a big mess out of it.

Furthermore, the CD filesystem (ISO9660) needs special encoding for filenames, and it's different from the DVD filesystem (UDF), then the Windows filesystem (NTFS/FAT32) is very different from that of MacOS (HFS+) and Linux (ext3/reiserfs). All transfers from one type to another may require re-encoding the filenames, making them volatile and thus incapable of preserving their data intact. And that's without even mentioning the way web servers and databases handle character encoding.

As I mentioned before, the best way to add important data to a file is using a metadata system. Please read the following articles:

http://en.wikipedia.org/wiki/Metadata (especially the section about file system metadata) http://en.wikipedia.org/wiki/Resource_fork (this is an old and very successful implementation of metadata in a file system)

oof!

I'll be in Consol, hope to see you there. I'll introduce you to some programmer friends, too. Rolando

PS: Let me give you a horrifying example to illustrate this further: 20-something years ago, Microsoft could only use 8.3 characters in DOS (8 characters in filename, 3 in 'extension', case insensitive) because of their own limitation. When Windows 95 launched, Microsoft added a 'new technology' called VFAT to the old 8.3 format that enabled long filenames (256 characters) but internally the filesystem still used 8.3, so that two files named My_Nice_Weekend_In_Alaska.jpeg and My_Nice_Gilfriend_Jodie.jpeg would be really named MY_NIC~1.JPE and MY_NIC~2.JPE, respectively. How's that for great software technology?

Now guess what? The real, long filename is stored inside the filesystem as metadata!

these are the ones dodo would want to work with.. these are the ones humans could read.. the idea is an easy to understand layer between complex filesystems and actual filenames.. dodo wants to work with apparent filenames.. like a humane layer between machine and human.. easy to read and share data..

,Great! But wait, there's more: This so-called 'technology' has been carried on to this day with every version of Windows up to Vista (although not used as the default anymore). Scary, huh? You can read more about it here: http://en.wikipedia.org/wiki/Vfat

gracias de nuevo Rolando! cya at consol:)

[edit] multilingual

The problem with that idea is that multilingual and character support is almost impossible to do right, due to many factors, like the terrifying implementation of long filenames and international character support in Windows, for instance. Then there's the problem of supporting it on the web, where all non-ISO characters are converted to their URL encoded counterparts.

dodo wants to be easy to share so priority is to share online.. so we'd want to start with URL encoded counterparts first.. which is making progress: http://en.wikipedia.org/wiki/Internationalized_domain_name

if url-ish filenaming or something near, maybe better:
3..2..1..0...txt
how..when..who..what.domain.tld
001.000000-000000-000000.gabriel.bad_is_good.domain.tld
bassackwards..as Jergas points out, using dns as filename needs propogation via dns.. so maybe less instant gratification, but maybe findable data.. probably more like totally clueless here.. especially timestamp in url, beyond idiocity.. nevertheless, a dream in meaningful urls is to make the browser url field a slick search interface, ie:
putstring..twext.cc = search "putstring"
my lack of DNS knowledge here is woeful, sorry.. it's just a wicked itch to use that URL as easy resource finder..

Just an example, suppose I have a file on my server called felizañonuevo2008.txt which requires you to type http ://example.tld/feliza%C3%B1onuevo2008.txt in the browser in order to find it properly, and then, if the file is not there (which is true) the server will respond with its own encoding scheme, which is "/felizañonuevo2008.txt was not found"

Now let's see a japanese example:

my file is called 明けましておめでとう2008.txt, then you need the URL encoding for the browser: http ://example.tld/%E6%98%8E %E3%81% 91%E3%81%BE%E3 %81%97%E3%81%A6%E3%81%8A%E3% 82%81%E3%81%A 7%E3 %81%A8%E3%81 %862008.txt and if there's no file, the server replies with "/明ã '㠾㠗㠦㠊゠㠧㠨㠆2008.txt was not found"

Worst of all, how can you type the name of an spanish file name if you don't have the ñ character in your keyboard? And how about the kanji character set?

human twext translators will need keyboards.. casual users of twext in a new language should be able to point and click.. if they become serious, they'll get the keyboards they need..

Try it yourself. Nice, huh? Standardizing filenames and making them useful is not a bad idea at all, but the world is a big mess.

The consensus around the industry is that it's a lot better to use metadata inside special files, like MacOS has been doing for the better part of about 15 years now with their (in)famous resource forks or using it inside the filesystem, which linux supports in the extended attributes fields already in ext2/ext3 and some other pluggable filesystems like Reiserfs.

The thing is, almost nobody comes up with a good implementation that really uses it. MacOS already has very good support for the files that are associated with the applications properly registered within the operating system. It enables you many things, like very nice and fast previews of the contents of a file using the file manager or some applications that recognize and sort their own files. It's the most advance example of the idea by far.

On the Linux front, Nautilus tries to do some of that too, but right now it doesn't make extensive use of metadata and therefore is not really useful. If you're interested on pursuing the idea, i'd recommend you focus on implement a better metadata system in Nautilus (one that takes advantage of both metadata files and filesystem extensions) and gives a nice standardized way to program support for applications (an API or library would be nice).

nautilus looks like way more than i can handle for now.. but thanks for the link!.. for now we're looking at twexml
thanks Rolando! very helpful and much appreciated!

[edit] semi related

The vocabulary can differ from one data set to the next, and that's fine. In fact it's great, because people want and need to express things in the ways that make sense to them... But on a deeper level the data files are compatible. It's easy to make equivalences between, say, title, in the MIT data, and position, in the Columbia data... Going a step further, the amazing David Huyn -- who is responsible for many of Project SIMILE's innovative web applications -- has created a tool that almost anybody could use to make those equivalences... There isn't yet a Strunk and White for URI design, but the recent book RESTful Web Services, by Sam Ruby and Leonard Richardson, does touch on the subject... (/parent/child), commas to encode ordered siblings (/parent/child1,child2), and semicolons to encode unordered siblings (/parent/red;green)... LibriVox, the collaborative project to make audio recordings of public domain books... --Jon Udell

[edit] idn

rolando says data in filename not new, maybe not bad idea, but multilingual filenaming is big mess

dodo data wants to be shared online.. IDN international domain naming is testing now:

"pressure to get the international domain names working because some nations, in particular China, are working on their own technology to support their own character sets."

[edit] namespace

dodo wansta play with numbered namespace, in some kinda way that is humane

just messin around here.. maybe dodo can be a humanizing layer between simple and complex:

Retrieved from "http://twext.com/Talk:dodo"
Personal tools