Pages in topic:   < [1 2 3 4 5] >
Lift technology - is it on its way?
Thread poster: Wojciech_ (X)
RWS Community
RWS Community
United Kingdom
Local time: 23:52
English
Thanks Kevin... Aug 27, 2015

kflanagansdl wrote:

Studio 2015 contains some great new AutoSuggest functionality as mentioned in some of the posts here, but it doesn’t yet contain Lift technology. Although SDL acquired Lift before releasing Studio 2015, new features/technology don’t get hastily added in and rushed out. As well as working as a translator, I’ve worked as a software developer in several places and seen many others, and while nowhere is perfect, I’ve found software development here at SDL to be very, very professional and polished, from software architecture through to UX design – and so getting new technology right takes time. The Lift prototype shown working in Studio 2014 only supported 4 languages, and although the ‘engine’ is language-independent, we do need to bring together certain per-language resources so as to support a more professionally-useful range of languages, and we have to run a lot of performance tests to get the best resource combinations. Some new subsegment recall features will be available quite soon (specifically, what I call ‘TM-TDB’ recall in the paper at http://www.jostrans.org/issue23/art_flanagan.php), while more advanced Lift functionality will take longer, perhaps to be available in 2016.

I hope that clarifies where we’re at with Lift …




... that cleared up some obvious misunderstanding I had too... time to read your papers more thoroughly!!

Regards

Paul
SDL Community Support


 
Meta Arkadia
Meta Arkadia
Local time: 04:52
English to Indonesian
+ ...
Summary Aug 27, 2015

kflanagansdl wrote:
...you’re right to be sceptical about performance based on a demo video, but more extensive tests show Lift performing much better than DeepMiner. You can read more about those tests at...


I successfully reproduced the test mentioned above. That's not much of a surprise. Subsegment matching has been available in CafeTran for several years (even before DV's DeepMiner), and it just works.



It shows the "hit" where you'd expect it, in the TM (1), or you can evoke it from the menu (2). By the way, it works for all languages.

I agree with you, kflanagansdl that the demo is no proof of the performance, but I do doubt your statement LIFT performs better than DeepMiner, let alone CafeTran. The results of your "more extensive tests" would need to be expanded to include method and real-life situations, in particular in the very CAT tool. Yes, I challenge you.

To summarise:

  • LIFT is late to the party
  • It's not yet at the party at all
  • At the moment, it works for a very limited number (4) of languages only
  • It may work better than other subsegment matching algorithms, but we want proof

    Cheers,

    Hans

     
  • Post removed: This post was hidden by a moderator or staff member for the following reason: empty
    Wojciech_ (X)
    Wojciech_ (X)
    Poland
    Local time: 23:52
    English to Polish
    + ...
    TOPIC STARTER
    Wait a second... Aug 27, 2015

    Meta Arkadia wrote:

    kflanagansdl wrote:
    ...you’re right to be sceptical about performance based on a demo video, but more extensive tests show Lift performing much better than DeepMiner. You can read more about those tests at...


    I successfully reproduced the test mentioned above. That's not much of a surprise. Subsegment matching has been available in CafeTran for several years (even before DV's DeepMiner), and it just works.



    It shows the "hit" where you'd expect it, in the TM (1), or you can evoke it from the menu (2). By the way, it works for all languages.

    I agree with you, kflanagansdl that the demo is no proof of the performance, but I do doubt your statement LIFT performs better than DeepMiner, let alone CafeTran. The results of your "more extensive tests" would need to be expanded to include method and real-life situations, in particular in the very CAT tool. Yes, I challenge you.

    To summarise:

  • LIFT is late to the party
  • It's not yet at the party at all
  • At the moment, it works for a very limited number (4) of languages only
  • It may work better than other subsegment matching algorithms, but we want proof

    Cheers,

    Hans



  • Hans, I might be wrong, but from what I can see CafeTran shows hits on more or less the same basis as MemoQ, that is, it displays useful phrases it has found in the TM, BUT without their exact translation.
    In other words, you didn't reproduce the test fully.
    Look at 4:41 of the video - and the text highlighted in blue - Lift was able to not only find subsegment match in the TM's source text, but ALSO its corresponding translation!

    The image you have attached shows only the full segment where the subsegment match has appeared, but in CafeTran it's STILL up to the translator to scan the segment and find the translation.
    In my humble opinion the ability of a CAT tool to show certain subsegment matches, but without its relevant translation is a nice thing, but far from being revolutionary. Trados 2015 is capable of such findings even now, without Lift.

    And this is exactly where Lift differs from Auto-Concordance. It's able to provide you with appropriate translation to a subsegment match, which is, I believe something that no tool has done so far, because it opens new possibilities, for example providing you with matches directly via Autosuggest, which is a great thing.

    Of course, it needs further testing and be compatible with more languages (Polish, please!), but it has a great potential!





    [Edited at 2015-08-27 11:29 GMT]

    [Edited at 2015-08-27 11:32 GMT]

    [Edited at 2015-08-27 11:34 GMT]


     
    Meta Arkadia
    Meta Arkadia
    Local time: 04:52
    English to Indonesian
    + ...
    A matter of settings Aug 27, 2015

    pro-lingua wrote:
    In other words, you didn't reproduce the test fully.


    I did, but I should have adjusted my settings. The problem being that I usually work with rather large TMs (for EU BS) that would trigger too many false positives and slow down going from a segment to the next one (there's a trick to avoid that for TMs of say less than 250,000 segments).

    In the words of CafeTran programmer Igor:

    The function has been present in CT for a few years, and if well tuned, it can be used with excellent accuracy both with Auto-assembling and Auto-completion. In particular, see the "Subsegment to Auto threshold" (set to 7 hits by default) option in Edit - Options - Memory tab. If you find that any false positive hits get in the way with Auto-assembling, just increase that number. You can also lower the number to get more hits if your TM is of a good quality (a specialized one). It really boils down to the size and quality of your translation memories to produce satisfactory results with this function.


    Cheers,

    Hans


     
    Samuel Murray
    Samuel Murray  Identity Verified
    Netherlands
    Local time: 23:52
    Member (2006)
    English to Afrikaans
    + ...
    CafeTran doesn't detect target language, so it aint "lift" Aug 27, 2015

    pro-lingua wrote:
    The image you have attached shows only the full segment where the subsegment match has appeared, but in CafeTran it's STILL up to the translator to scan the segment and find the translation.


    I was just about to post the same. CafeTran only tells the translator that it's found a segment with "conforme au cahier des charges" in it, but doesn't tell the translator what the target text of that segment is. Lift does.



     
    kflanagansdl
    kflanagansdl
    United Kingdom
    CafeTran and subsegment recall Aug 27, 2015

    Yes, @pro-lingua, you're absolutely right. The key point is that Lift will identify the translation of the fragment within the larger segment. @Meta Arkadia, you might find it interesting to read the paper at http://www.jostrans.org/issue23/art_flanagan.php . Using the typology defined there, the CafeTran functionality you've shown would be described as ACS (see section 4.2) as opposed to... See more
    Yes, @pro-lingua, you're absolutely right. The key point is that Lift will identify the translation of the fragment within the larger segment. @Meta Arkadia, you might find it interesting to read the paper at http://www.jostrans.org/issue23/art_flanagan.php . Using the typology defined there, the CafeTran functionality you've shown would be described as ACS (see section 4.2) as opposed to the DTA functionality provided by Lift (see section 4.3). It certainly can be useful to have ACS, but having to scan the match translation segment to identify the translation of a fragment can be tiresome, especially when there are multiple results for the same 'query' sentence, or even multiple variant translations for the same matching fragment. You might find http://www.kftrans.co.uk/lift/fillingInTheGaps.pdf an interesting read as well.Collapse


     
    Michael Beijer
    Michael Beijer  Identity Verified
    United Kingdom
    Local time: 22:52
    Member (2009)
    Dutch to English
    + ...
    same thing, iyam Aug 27, 2015

    Samuel Murray wrote:

    pro-lingua wrote:
    The image you have attached shows only the full segment where the subsegment match has appeared, but in CafeTran it's STILL up to the translator to scan the segment and find the translation.


    I was just about to post the same. CafeTran only tells the translator that it's found a segment with "conforme au cahier des charges" in it, but doesn't tell the translator what the target text of that segment is. Lift does.





    It might not in this example, but with tweaked settings, or a different set of TMs, it might. CT can also guess stuff, just like LIFT. Hans's screenshot shows a few longer hits. However, sometimes they are much shorter, and actually correct: i.e., they will consists of only src paired with target, and all guessed by CT's subsegment matching algorithm.


     
    Wojciech_ (X)
    Wojciech_ (X)
    Poland
    Local time: 23:52
    English to Polish
    + ...
    TOPIC STARTER
    Warning! Aug 27, 2015

    Michael Beijer wrote:

    Samuel Murray wrote:

    pro-lingua wrote:
    The image you have attached shows only the full segment where the subsegment match has appeared, but in CafeTran it's STILL up to the translator to scan the segment and find the translation.


    I was just about to post the same. CafeTran only tells the translator that it's found a segment with "conforme au cahier des charges" in it, but doesn't tell the translator what the target text of that segment is. Lift does.





    It might not in this example, but with tweaked settings, or a different set of TMs, it might. CT can also guess stuff, just like LIFT. Hans's screenshot shows a few longer hits. However, sometimes they are much shorter, and actually correct: i.e., they will consists of only src paired with target, and all guessed by CT's subsegment matching algorithm.



    Michael, if you promote CT in this forum too much, then SDL may take it over, or worse (for CT users - good for Trados users), hire Mr Kmitowski and the bright future of CT might be endangered


     
    Roy Oestensen
    Roy Oestensen  Identity Verified
    Denmark
    Local time: 23:52
    Member (2010)
    English to Norwegian (Bokmal)
    + ...
    Would need marketing survey for that Aug 27, 2015

    pro-lingua wrote:
    Michael, if you promote CT in this forum too much, then SDL may take it over, or worse (for CT users - good for Trados users), hire Mr Kmitowski and the bright future of CT might be endangered


    If I were Studio, I would definitely do a survey to check if it would improve on Studio's market shares or not. My impression regarding CafeTrans isn't very favourable compared to Studio or Dejavu, so I would not be more inclined than I already am towards Studio if they acquired CafeTrans.

    Roy


     
    kflanagansdl
    kflanagansdl
    United Kingdom
    re: same thing, iyam Aug 27, 2015

    Hi @Michael Beijer - it sounds to me as if what you're describing are cases where the functionality is referred to as TM-TDB in the typology at http://www.jostrans.org/issue23/art_flanagan.php. If your sentence to translate is (say), "This will include creating a Dynamic Purchasing System for the customer", and the TM contains a TU whose source text is (only) "Dynamic Purchasing System" ... See more
    Hi @Michael Beijer - it sounds to me as if what you're describing are cases where the functionality is referred to as TM-TDB in the typology at http://www.jostrans.org/issue23/art_flanagan.php. If your sentence to translate is (say), "This will include creating a Dynamic Purchasing System for the customer", and the TM contains a TU whose source text is (only) "Dynamic Purchasing System" (because it's occurred previously as a segment by itself such as a section title), then of course the target text of that TU can be reliably proposed as a translation for the fragment "Dynamic Purchasing System". In these cases, there's no need for any Lift-like ability to identify the fragment of TU target text that correspond to a matching TU source text fragment, because the match is with the whole TU source segment. But these cases are generally much rarer. Is this the kind of 'guess' performed by CafeTran that you're referring to, and if not, could you provide examples of what is? I'm very confident that it isn't the 'same thing' as what Lift does, and that Lift provides significant additional functionality (which can be measured against other CAT tools using the methodology described in the above paper and the materials available at http://www.kftrans.co.uk/benchmarks ), but of course I'd be interested to see whatever CafeTran functionality cases might seem to show otherwise.Collapse


     
    Wojciech_ (X)
    Wojciech_ (X)
    Poland
    Local time: 23:52
    English to Polish
    + ...
    TOPIC STARTER
    Hmmm Aug 27, 2015

    Roy Oestensen wrote:

    pro-lingua wrote:
    Michael, if you promote CT in this forum too much, then SDL may take it over, or worse (for CT users - good for Trados users), hire Mr Kmitowski and the bright future of CT might be endangered


    If I were Studio, I would definitely do a survey to check if it would improve on Studio's market shares or not. My impression regarding CafeTrans isn't very favourable compared to Studio or Dejavu, so I would not be more inclined than I already am towards Studio if they acquired CafeTrans.

    Roy


    I was merely joking


     
    Michael Beijer
    Michael Beijer  Identity Verified
    United Kingdom
    Local time: 22:52
    Member (2009)
    Dutch to English
    + ...
    here’s an example of what I mean (CafeTran LIFTing) Aug 27, 2015

    In this example, I quickly ran Total Recall against my massive TMlookup TM database (with around 45,000,000 TUs) and generated a temp TMX. Note that this took less than one minute, and the TMX is now available for use in my current project.

    LIFT = ‘subsegment matching’ in CafeTran

    some_text

    some_text

    some_text

    some_text

    Michael

    @Hans: I indeed tweaked my "Subsegment to Auto threshold" setting a bit (as suggested by Igor). It was set to 2. I changed it to 7 for this example.

    [Edited at 2015-08-27 13:22 GMT]


     
    Michael Beijer
    Michael Beijer  Identity Verified
    United Kingdom
    Local time: 22:52
    Member (2009)
    Dutch to English
    + ...
    @kflanagansdl Aug 27, 2015

    kflanagansdl wrote:

    Hi @Michael Beijer - it sounds to me as if what you're describing are cases where the functionality is referred to as TM-TDB in the typology at http://www.jostrans.org/issue23/art_flanagan.php. If your sentence to translate is (say), "This will include creating a Dynamic Purchasing System for the customer", and the TM contains a TU whose source text is (only) "Dynamic Purchasing System" (because it's occurred previously as a segment by itself such as a section title), then of course the target text of that TU can be reliably proposed as a translation for the fragment "Dynamic Purchasing System". In these cases, there's no need for any Lift-like ability to identify the fragment of TU target text that correspond to a matching TU source text fragment, because the match is with the whole TU source segment. But these cases are generally much rarer. Is this the kind of 'guess' performed by CafeTran that you're referring to, and if not, could you provide examples of what is? I'm very confident that it isn't the 'same thing' as what Lift does, and that Lift provides significant additional functionality (which can be measured against other CAT tools using the methodology described in the above paper and the materials available at http://www.kftrans.co.uk/benchmarks ), but of course I'd be interested to see whatever CafeTran functionality cases might seem to show otherwise.


    Hi Kevin,

    Sorry, but I haven't had time to read your papers (which I will though, as they look very interesting).

    In my post below, I tried to show you CafeTran doing what I believe is pretty much the same thing as Lift.

    My src segment:

    "Investeren in Employee Wellness verbetert de gezondheid, motivatie en betrokkenheid van uw medewerkers."

    One of my TMs contains:

    In hetzelfde punt geeft zij aan dat zij met name rekening heeft gehouden met de graad van betrokkenheid van elk van de ondernemingen bij de heimelijke afspraken en de rol die elk van hen daarbij had vervuld, en dat zij geen van de ondernemingen als toonaangevend" had aangemerkt voor de vraag wie de grootste verantwoordelijkheid voor de inbreuk moest worden toebedeeld.
    =
    In point 53, it stated, first, that it had considered, inter alia, the degree of involvement of each of the undertakings in the collusive arrangements and the role played by them therein and, second, that it had not identified any undertaking as the ringleader' for the purposes of attributing the major responsibility.

    CafeTran managed to guess that "betrokkenheid van" = "involvement of". Is this what Lift does?

    Michael

    [Edited at 2015-08-27 13:30 GMT]


     
    kflanagansdl
    kflanagansdl
    United Kingdom
    CafeTran 'Total Recall' Aug 27, 2015

    Hi @Michael Beijer - thanks for posting details of that example from CafeTran when using Total Recall. It does differ from what Lift provides in a number of respects (once again, do feel free to read the paper(s) cited for further details). If I'm interpreting the screenshots/details well, the functionality provided by Total Recall appears to be wh... See more
    Hi @Michael Beijer - thanks for posting details of that example from CafeTran when using Total Recall. It does differ from what Lift provides in a number of respects (once again, do feel free to read the paper(s) cited for further details). If I'm interpreting the screenshots/details well, the functionality provided by Total Recall appears to be what is referred to at http://www.jostrans.org/issue23/art_flanagan.php as Bilingual Fragment Extraction (BFE). There are a number of other CAT tools with BFE functionality, and it certainly can be useful, but it does has some disadvantages. Firstly, if you add new content to the TM, that content isn't immediately available for subsegment recall; you have to run the extraction step again. Secondly, depending on just how the BFE is implemented - e.g. it may be statistical, as with Trados AutoSuggest Creator, and I'm guessing Total Recall also uses statistical methods, or it may be linguistic, as with Similis - there can be either important prerequisites and/or 'blind spots'. Statistical BFE requires a TM of considerable size (as with the one you used) and while it's possible to 'pad out' a TM for a given project with some other data, that runs the risk of showing you suggestions that aren't suitable for the field you're translating in. This is important because of another disadvantage: BFE implementations often essentially 'decontextualise' the fragments recalled, so you don't get to see the TU from which the fragment match and fragment translation has been drawn - though in the case of Total Recall, it looks like it may also show you the original context (can't quite tell from the screen shots), which is good. Most importantly, statistical BFE can be very selective about the kinds of thing it will recall, because of the way the maths works. Generally speaking, translations of one- or two-word fragments with distinct meanings can be identified pretty well (certainly if they occur several times), while a key ten-word clause in a contract that only occurs, say, once before in the TM, but which it's legally important to translate in the same way, is less likely to be identified. If BFE is implemented linguistically, you're less likely to have a TM size requirement, but more likely to have blind spots based on incommensurable grammar. If BFE is implemented using EBMT-inspired techniques, some level of repetition is again needed to identify translations, so translations for fragments found only once in the TM are again much less likely to be identified. It'd be very interesting to know how well the subsegment recall provided by Total Recall measures up against 1. the test suite at http://www.kftrans.co.uk/benchmarks, though it's quite a lot of work to perform all those tests, and 2. the wishes of translators as found in the survey reported in the aforementioned paper. In summary, I'd say Lift differs from what you've shown in CafeTran in that its subsegment recall will work 1. even if the fragment is found only once in the TM and 2. even if the TM has only 1 TU in it 3. dynamically, without requiring any extraction step ... and while recall when your TM has only 1 TU is obviously slightly artificial, it does underline an important aspect of the functionality, i.e. not requiring a large TM. My posts here can't explain or provide nearly as much detail as my papers, though, so do take a look at those if this doesn't seem significant.Collapse


     
    Pages in topic:   < [1 2 3 4 5] >


    To report site rules violations or get help, contact a site moderator:


    You can also contact site staff by submitting a support request »

    Lift technology - is it on its way?







    Trados Studio 2022 Freelance
    The leading translation software used by over 270,000 translators.

    Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

    More info »
    Wordfast Pro
    Translation Memory Software for Any Platform

    Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

    Buy now! »