lyric chunkster

From twext

Jump to: navigation, search

LYRIC CHUNKSTER FOR TWEXTER IN PHP

this should be a simple job for a developer with solid experience in PHP, maybe AJAX and definitely OPEN SOURCE collaborative software development.

Contents

[edit] MILESTONES:

  1. CHUNK LYRICS
    1. CHUNK LINES AND CHORUS
    2. CHUNK WITHIN LINES
    3. CLEANUP
  2. TRANSLATE CHUNKS
    1. INTERMEDIARY INTERFACE FROM
    2. HTTP://TRANSLATE.GOOGLE.COM
    3. TO HTTP://TWEXT.CC/TWEXTER
  3. COMPLETE
    1. PLUGIN TO TWEXTER PHP CORE
    2. HTTP://TEST.TWEXT.CC
    3. FULL COMMENTS, PSEUDOCODE
    4. CREDITS, COPYRIGHT, LICENSE
    5. SHARE AT HTTP://SF.NET/projects/twexter

[edit] background

the chunkster software:

  • "chunks" text,
  • sends chunked text to http://translate.google.com, where the user identifies the ordered pair of languages he will be working with,
  • gets chunk translations,
  • then puts chunked text and translation into proper textarea input fields at http://twext.cc/twexter

"chunking" text means get text input, insert an ordered set of returns into the text, then output the text.. the returns are inserted in pattern as per method..

lyric chunking is much simpler than text chunking.. we don't have to solve the text chunking problem now.. we do have to solve the lyric chunking problem now.. solve the problem how you want.. know that even the text chunking problem was once sorta solved with this solution:

user defined text strings specify where returns are added to chunk a text.. returns are added three ways:

  1. BEFO before a specified text string
  2. AFTE after a specfified text string
  3. BOTH before and after a specified text string

EXCEPTIONS to above rules were defined in related text fields.. exceptions let users instruct chunkster software NOT to insert returns before, after, or before and after specified, excepted text strings

ancient perl code at http://twext.com/dev/chunkster/chunkster.perl.txt once chunked text.. sorta..

the perl script refered to user modifiable textarea input fields like those illustrated here: image:ChunxtaExceptions.png

http://twext.cc/dev/dev2006.html#CHUNKSTA

a flowchart at http://twext.com/dev/chunkster.pdf and text at http://twext.cc/go/814 describe how the solution worked.. this problem is much simpler when focus narrows to simply CHUNK LYRICS..

[edit] CHUNK LYRICS

here is an example of a simple lyric text that lyric chunkster must chunk:

-------------------------------+
this line of lyrics to a song, 
another line, not as long      
                               
skip a line to start a chorus  
                               
                               
                               
                               
-------------------------------+

[edit] 1. CHUNK LINES AND CHORUS

  1. insert cursor at beginning of second line of text
  2. inserts return before the second line
  3. moves down one line
  4. inserts a new return
  5. loop C and D until the end of the text
  6. performs any cleanup needed to get this result:
-------------------------------+
this line of lyrics to a song, 
                               
another line, not as long      
                               
                               
skip a line to start a chorus  
-------------------------------+

[edit] 2. CHUNK WITHIN LINES

refer to chunk criteria inputs as demonstrated at: http://twext.cc/dev/dev2006.html#CHUNKSTER

  1. find BEFO STRINGS: add one return BEFORE
  2. find AFTE STRINGS: add one return AFTER
  3. find BOTH STRINGS: add one return BOTH before AND after
if the BEFO STRINGS include: "to", "a"
and if AFTE STRINGS include: ","
and if BOTH STRINGS include: "this"
then the software add returns to produce this result:
-------------------------------+
                               
this                           
line of lyrics                 
to                             
a song,                        
                               
                               
another line,                  
not as long                    
                               
                               
skip                           
a line                         
to start                       
a chorus                       
-------------------------------+

[edit] 3. CLEANUP

if a line starts with a BOTH STRING,
then delete the return added before string in step 2

if AFTE STRING ends a line
then delete the return added after string in step 2 

after step three, we should get this result:
-------------------------------+
this                           
line of lyrics                 
to                             
a song,                        
                               
another line,                  
not as long                    
                               
                               
skip                           
a line                         
to start                       
a chorus                       
-------------------------------+ 

the result is text chunked as per twext method, so please include some wiggle room in your bid here.. the main trick is find and fix double chunk errors..

we focus on chunking one language, english, for now.. meaning only on set of BEFO, AFTE, BOTH fields..

http://twext.cc/dev/dev2006.html#CHUNKSTER includes fields for exceptions of chunk criteria.. you should anticipate this function but we DO NOT NEED TO MAKE EXCEPTIONS AT THIS STAGE

our purpose is to create and test a very simple LYRIC CHUNKSTER written in PHP5.. we will test your lyric chunkster with a wide variety of lyrics, poems etc.. obviously, the above solution is not complete, so we'll probably run through a few testing cycles.. the problem, however, is clearly defined, narrow and simple, so this is unlikely to be a huge challenge for us..

if we DO need to add a layer of complexity to make lyric chunkster work, such as including fields to manage exceptions to BEFO/AFTE/BOTH chunk strings, then that will be extra work and extra pay for you..

[edit] MACHINE TRANSLATE

GOOGLE TRANSLATE THE LYRIC CHUNKS

when we are happy with lyric chunkster function, we will connect it to an http://translate.google.com interface.. ideally, we have an intermediary interface that integrates with http://twext.cc/twexter

thus, a user will

  1. go to http://translate.twext.com
  2. insert lyrics in textarea
  3. select language pair
  4. click "twext" button

image:DelyricTwexterFront.png

then, the software will

  1. chunk the lyrics via the THREE STEPS above
  2. insert chunked lyrics into text field at http://translate.google.com
  3. match language pair selected at http://translate.twext.com
  4. get translation data from http://translate.google.com
  5. convert translation to ALL CAPS
  6. post translated chunks in http://twext.cc/twexter TWXT twext field
  7. post original chunked lyrics in http://twext.cc/twexter TEXT text field
  8. send user to http://twext.cc/twexter with chunked translated text formated twext

user, if translator, can make any corrections needed, and save the file on their machine.. later we'll worry about titles, storing and finding data, titles, users, managing languages and all that stuff.. for now, we focus very simply on CHUNKING LYRICS and GETTING TRANSLATIONS and the PLUGIN to twexter..

[edit] COMPLETE

completion of work must include: plugin interface to core, clear English code comments; psuedocode; author name and email contact; copyright info; GPLv2 license; and source code shared at http://sf.net/projects/twexter

[edit] PLUGIN TO PHP TWEXTER

http://twext.cc/go/plugin descibes a PHP architecture being developed to include/exclude functions like LYRIC CHUNKSTER into twexter software builds.. this should be pretty easy to coordinate

[edit] TEST.TWEXT.COM

plugin core connects lyric chunkster with http://twext.cc/twexter function.. we test complete system installable as needed at http://test.twext.cc

[edit] CLEAR COMMENTS IN ENGLISH

your core code must be explicitly commented and explained so other developers can easily participate..

[edit] EXPLICIT PROGRAM LOGIC

this is freely licensed open software.. parallel systems may emerge in python, ruby, scheme, etc.. PSEUDO CODE, UML, FLOWCHART or likewise explicit description of the program logic is required to help others understand, use and extend your work..

[edit] SHARING AND ATTRIBUTION

please include your name and email so developers can know who did the work and contact you if need be.. successful execution of this work may also serve to promote your services.

[edit] ASSIGN COPYRIGHT TO READ.FM

this is an explicit work-for-hire agreement you assign your copyright to read.fm, which freely releases your work under the GPLv2..

[edit] INCLUDE COPYRIGHT AND LICENSE NOTICE ON ALL CODE:

twext helps you learn to read in any language
free software: http://sf.net/projects/twexter
Copyright © 2008 READ.FM http://license.read.fm
http://more.read.fm/more more read, more market

[edit] SHARE SOURCE CODE

you share complete source code and all documentation at http://sf.net/projects/twexter

[edit] PAYMENT

funds are already in GAF escrow.. upon satisfactory completion, payday

[edit] SERIOUS

please spare us if not serious.. Waqas won http://twext.cc/dev/twexterBASIC.html job by delivering a rough working demo *before* bid awarded.. he showed real skills and interest.. these next steps, done well, can lead to extensive future collab..


zura re chunkster some old code might be helpful to do chunkster?

chances our this chunkster for now will we probably should add a very simple layer between lyric chunkster and translate.google..

re: chunkster and editing chunks, the live preview at http://twext.cc/twexter makes editing nearly impossible.. an earlier twexter had twexml option and easier chunk editing: http://twext.cc/twexml

both above basic twexter versions (code by Waqas) printing on the wiki soon

here's a list of chunkster chunks in english to play with:

[edit] BEFO

i, your, to, under, on, at, of, the, you, your, you're a, so, my, is, too, i'm, i've, you'll, as, she, she's, in, don't, by, are, has

[edit] AFTE

,, then, it, with, but, there's, me,

[edit] BOTH

and, that, if, now, how, what, when, why, where, for, with, whenever, without, who, in


more advanced chunkster is at

but we aren't doing that yet

Personal tools