18

From twext

(Redirected from i18n)
Jump to: navigation, search
 

we need to

  • generally discuss this
  • will plugin help?
  • define a clear plan
  • break it down into achieveable milestones..

18 is a alt page for multilingual and i18n .. simplify:

  • file storage retrieval system
  • prefs
  • slop
  • find
  • addlang
  • complete interface




PLEASE SEE 19 FOR DETAILS

image:POO.gif

complete inteferface almost done..
full spec by then..

image:ADDLANGRUFF.gif
oi ve.. language management.. oof!










image:Find.gif
will be integrated w/ above..

Contents

[edit] simplify

.twx describes filename=db proposal.. maybe
too complex at this demo stage? 
we don't need #tags yet..
we could start more basic:

currently in xml we have:
<twext>
  <para>
    <line>
      <text>hello</text>
      <twxt>hola</twxt>
    </line>
  </para>
</twext>

AFAIK, minimum "meta" data we need in file is:
title
twxt_lang 
TEXT_LANG


we can prepare to add more in future, ie:
date_created
date_modified
author
owner
license
editors (translators)
tags

so, if we restrict new xml tags to minimum, we add:
<twext>
<title>TITLE OF DOCUMENT</title>
<TEXT_LANG>ENGLISH</TEXT_LANG>
<twxt_lang>español</twxt_lang>
  <para>
    <line>
      <text>hello</text>
      <twxt>hola</twxt>
    </line>
  </para>
</twext>

now we can save a title in within twext language
categories (no more need for twext folders) 

probably the simplest system will 
* acheive intended demo result
* provide user feedback on what next

intent of http://twext.cc/go/18#.twx is make human 
readable summary of findable data easily accessible
and easy to improve..

if we can make this very simple multilingual 
twexter work w/ http://twext.cc/go/18#.twx 
that might be cool.. but if it's really complex
better simple for now..

[edit] if we try .twx

if .twx filename=db and array definitions
from RIGHT TO LEFT, then maybe .twx file named like this:

..twextlanguage.TEXTLANGUAGE..title..twx_version..timestamp.twx

* timestamp is our always unique resource locator
* twx_version maps meta data ie:
  ..3..2..1..0.twx    = map for .twx version 0.1
  ..twx.TXT..         = twext language.TEXT LANGUAGE (case sensitive)
  ..title             = document title
  ..twx_version_0.1   = version of .twx array
  ..timestamp         = unique resource locator

then filename might look like this:
..español.ENGLISH..Example-of-a-Filename..0.1..20060425.122022.0.twx

contents of file might look like this:
<twext>
   <time>20060425.122022.0</time>
   <version>0.1</version>
   <title>Example of a Filename</title>
   <TXT>ENGLISH</TXT>
   <twx>español</twx>
      <para>
      <line>
         <text>hello</text>
         <twxt>hola</twxt>
      </line>
      </para>
</twext>

[edit] .twx

.twx is now called dodo

.twx is maybe we turn titles into somekinda database
so we define and find data in a ..numbered namespace..

..n..twx

..0..1..2..3..4..twx
..what..where..when..who..etc..twx
..title..langs..time..editors..tags..twx
..long_title..ENGLISH.espanol..200714.093422.1..waqas,duke..file_system,spot,db..twx

print report easy:
title: LONG HONKIN TITLE
langs: ENGLISH.espanol
clock: 200714.093422,1
edits: waqas, duke
mtags: file system, spot, meaningful url, twexter, db

search could focus on specific ..n.. to find..

[edit] timestamp

timestamp per save spots single point of truth in memory.. 
so a .twx file needs not title (..0..) thus can maybe serve 
dichos and citage.. a twext_object?
filemaker has a nice way to save.. it just saves.. while timestamping ? maybe ram twext object? save in temp ram until stable then timestamp..

[edit] history

if file saved (w/ timestamp but no other info), then later
opened and modified, the new file should be associated with
the original.. maybe if ..0.. = timestamp, then
..20070425.123456.789_20070422.123456.789..
?
saving file histories is needed for wiki-like revert
capabilities.. together, files record a process..

[edit] twext object

timestamp creates a twext object ..time..twx 
further definable by languages, tags, title..

[edit] mtags

tags (google labels) categorize info.. 
can simple markup make tags meta?
a;author
l;license

so, if ..4.. = ..mtags.. then in ..4..:
..l;link,c;credit,a;author,l;loop,w;whateva.. simple arrays

[edit] translatable

if
  1A=A3
  2B=B1
  3C=C2
then
  1=3
  2=1
  3=2

so could be rearranged right to left:
..who..tag..when..where..what..twexml

disorder, reordered:
..wtf..what..where..who..when..twexml

..3..2..1..0..twx

or mapped to new versions of .twx data?

[edit] garbage collection

?

[edit] find

turn url field into search interface ie if
twext.cc/x;english,xx;espanol..a;lennon
is input direct into url field, 
then twexter searches twext.cc for
"ENGLISH" TEXT with "spanish" twext for author "Lennon"

or search by date, language pair
twext.cc/t;20070425,x;english;xx;espanol

etc

[edit] feedback

  • roberto liked it, "if implemented right"..
  • gerardo said ok, but not sure about xml..
  • jergas though it might work fast =)
  • waqas say maybe worth a try
    • having specific string at specific number causes backward compat problem
solved by version? ideally the whole twext system will associate strings w/ numbers for translating between versions/languages
    • mysql searches fast
maybe .twx could search fast too?
    • php app would need .net or java search?
oi ve.. io? or ruby or lucene?
  • evert?
  • zura?





image:COW.png
easier to just input direct to twext?













one way xcroll can xcroll:
image:XCROLLLOOKFEEL7.gif
another:
image:GO2.gif
above xcroll is experiment.. unessential

[edit] prefs

simplest possible interface for user to control styles, output and languages to included/exclude

[edit] slop

Select Language OPtionscombined w/ prefs from xnav:
image:WONKfool.gif

[edit] find

search ..db..twx to find info in numbered namespace

[edit] addlang

simplest possible interface to a add language

[edit] complete interface

  • visitor visits
  • finds twext
  • edits twext
  • saves files
  • controls styles
  • controls languages
  • creates account
  • saves prefs

image:XCROLLLOOKFEEL7.gif

MULTILINGUAL TWEXTER SPEC
multilingual twext helps us save, manage, find twext texts..

evert says XUL.. we have twexml, maybe w/ ..db.. files?

  • output:

"flexible" meaning easy to let other programs read data in saved file, as a single point of truth with variable output: then convert to various formats:

hopfully ..db..twx solves below:

[edit] searchable?

you tell me.. sounds like XML lets us define new doc type, then perform searches to find only our new type of doc.. even filter specific searches within our new doc type.. if so, our new xml doc type might implement some features:

[edit] language identification

give user option to identify languages of TEXT and twext parts of saved file.. for example, within a document,

  • TEXT may be kiswahili
  • twxt may be portugues

if someone is in brazil learning kiswahili, they'll want to find twext docs with big kiswahili TEXT supported by little twxt in portugues..

the design should anticipate trilingual and multilingual content within a single document.. at a root level, we might try to avoid being trapped by categories

[edit] tags

let user simply add tags and maybe somekinda shortcut smart tag.. tags are good because data isn't trapped by categories.. categories are good because they help sort data.. tags are a great example of being able to categorize data without trapping it in some freaking folder somewhere.. we should find an easy, flexible way to include and modify tag info in our xml docs..

  • t; idea, half-baked, suggestion tag
  • a; last name, first name author tag
  • o; yadda yadda, inc owner (publisher) tag
  • x; translator name name of translator tag

idea is to make it easy to tag a twext file w/ info easy to sort.. ie if line starts with "x;" then doc may include translator info repeat: i'm new to xml so this may sound real stupid..

[edit] title

3rd in line.. titles are useful, but tags might be more useful.. especially in the context of delyric citage.. another type of title-less twext object may be dichos (simple quotes).. maybe titles could just be tags? please don't hurt me

twext folders are toast.. ..db..twx can identify language pair and multilingual..


below will digest w/ mockups the print above..

[edit] SLOP select language options menus

Select Language Option Menus might be DHTML trick.. basically, drop down menus loaded with user_pref langauges.. i've seen some work very well, others not.. so maybe two ways to go:

  1. slick DHTML (precursor to xnav)
  2. old school html (ugly but widely useful)

SLOP lets user Select Language OPtions.

image:Slop1.png

slop uses dynamic menus: twexter responds to changes in either menu instantly, with no need for any additional input button.

the interface language is controlled by the twext_slop_menu on the left.

so if user changes twext_slop_menu from english to español, then interface language changes from english to español

image:Slop2.png

on the right, the TEXT_slop_menu controls the TEXT language; this menu directs search queries, chunktext inputs or format fetches to internal xml format

so if user changes TEXT_slop_menu from ESPAÑOL to ENGLISH, queries, fetches and inputs will interact with xml

image:Slop3.png

the TEXT_slop_menu does not control the interface language, but make one exception: the word for "TEXT" (left of TEXT_slop_menu) should be in language selected by TEXT_slop_menu. confusing? either eliminate "twext" and "TEXT" labels (rely on lowercase in twext_slop and UPPERCASE in TEXT_slop) or just go DHTML:

image:Slop4.png

solutions must be robust, bug-free, strong on most browsers and work with unicode.

if in any SLOP MENU user selects "+"

image:Slop5.png

temporarily save user input, so user can be sent back to page without losing input then send user to prefs:

[edit] XPREF twext user preferences

to multilingualize, we need to identify languages.. adding languages to twext will hopefully be very easy, so many languages, even dialects, slangs, hybrids may be added.. too many..

users are likely to want to exclude many languages from their twext interface.. users may also want to control personal preferences for format, style and output of twext text between any two languages..

http://twext.cc/dev-old/xo/control.html shows ancient attempt at user preferences interface.. soon regurgitating here..

[edit] FIND

hopefully "internal xml" output will produce flat files that, if stored in urls, can easily be searched with your favorite search engine.. hopefully, we can add tags to such "internal xml" output to filter searches by language, author, translator, generic tags, etc..

is this possible with xml output we're defining?

waqas?

question re: naming convention

  • "internal_xml_output"

twext is meaning lots of things..

  • for end users, a "twext" file should be something easy for them to find in the languages they want..
  • for developer, "twext" is meaning
    • chunk translation input
    • xml output
    • xslt conversion to
    • many end file formats

what should we call "twext internal xml ouput"? xxml? xmlx? twexml? twextxml?

waqas?

[edit] ADDLANG add languages to system

before adding langs to twext, user should be able to

  • PREView
  • SAVE
  • PREF
  • FIND
  • SLOP

addlang should cost minimal user work, and avoid being trapped by categories.. when preview, save, prefs, find, slop work, then we can add languages to twexter:

image:TwexterAddMaya.gif


 

also see http://twext.cc/dev/dev2006.html

UI for multilingual twexter:

  • paper prototype: does not explain how the interface is supposed to work, but merely simulates what the interface would do. In this manner, you can identify which parts of the interface are self-explanatory and which parts are confusing.

1. BASIC TWEXTER SPEC defines an interface people might use to get text twext.. it may have useful features:

  • INPUT
    • stoopid, not hard to understand for a user
    • xcroll easily edit/compare text/translation
    • input/preview combined in single interface
  • OUTPUT
    • solve html table prob
    • format adjustable
    • XML ready > CSS

2. MULTILINGUAL TWEXTER:

  • save and find basic twexter files, identified by attributes like language_pair, tags, title, author, owner, translator, license, etc..
  • prefs let us limit our focus to languages we're learning and lets us customize our twext experience
  • slop makes it easy for us to navigate languages
  • addlang lets people easily add languages, dialects, slangs to get twext

3. AUTOMAGIC TWEXTER:

  • xtext fast, flexible twext line break service
  • chunxa automatically chunks TEXT to get twext
  • mtport connects machine translation to chunxa
  • xcat connect chunxa + mtport + basic interface
  • xurl automagically twext translates urls

comment: automagic twexter is likely to suck at first but will hopefully learn to suck less.. xcat might help machines learn from human corrections.. at this stage, twexter may be useful to OmegaWiki

 

Retrieved from "http://twext.com/18"
Personal tools