How to create an online dictionary?
Thread poster: Michael Beijer
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 23:05
Member (2009)
Dutch to English
+ ...
Sep 7, 2010

I was wondering if anyone has any tips about how to create an online dictionary. More like a simple bilingual glossary actually. I would like for it to have a nice simple search box thingee, which might also display the context of the term like right clicking "See context" in Xbench. Basically kind of like TecDic or dict.cc or some such.

I am more or less able to code html, and would like to keep the database in txt if possible, but I also have Microsoft Access; I can also create M
... See more
I was wondering if anyone has any tips about how to create an online dictionary. More like a simple bilingual glossary actually. I would like for it to have a nice simple search box thingee, which might also display the context of the term like right clicking "See context" in Xbench. Basically kind of like TecDic or dict.cc or some such.

I am more or less able to code html, and would like to keep the database in txt if possible, but I also have Microsoft Access; I can also create MySQL databases on my server ...

Any ideas as to how I might proceed?

Thanks in advance!

Michael
Collapse


 
John Fossey
John Fossey  Identity Verified
Canada
Local time: 18:05
Member (2008)
French to English
+ ...
PHP or ASP Sep 7, 2010

Better learn PHP or ASP. HTML is client side, but you want server side coding.

 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 23:05
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
MySQL & PHP Sep 7, 2010

Hmm. I found something interesting here:

http://www.bluehostforum.com/showthread.php?482-How-Do-I-Build-A-Dictionary-Database

Where someone suggests:

use a database (MySQL, XML, whatever) which stores each word, and give each word an index and a language specifier, then give each word language-equivilen... See more
Hmm. I found something interesting here:

http://www.bluehostforum.com/showthread.php?482-How-Do-I-Build-A-Dictionary-Database

Where someone suggests:

use a database (MySQL, XML, whatever) which stores each word, and give each word an index and a language specifier, then give each word language-equivilent field (used to define the index of another word which means the same) like so:

(index) (word) (lang) (lang_equiv)
1 apple eng 2
2 manzana span 1

then use a method which converts it all. In PHP & MySQL this would be like so:

PHP Code:

function translate( $word )
{
$word_index = mysql_result( mysql_query( "SELECT * FROM dictionary WHERE 'word' = " . $word) , 'index' );

return mysql_result( mysql_query( "SELECT * FROM dictionary WHERE 'index' = " . $word_index ) , 'word' );
}


Now, what I am ideally looking for is a way to take a few shortcuts. That is, is there any way to simplify some of the steps involved here? Such as converting bilingual glossaries from tab delimited text files into database friendly format, etc

Also, can someone point me in the right direction re. possible PHP templates for dictionaries/language databases, etc.

Michael
Collapse


 
Christopher Schmidt
Christopher Schmidt  Identity Verified
United States
Local time: 18:05
German to English
+ ...
consider perl as well Sep 8, 2010

As an exercise to learn perl, I created a smallish German-English online dictionary a number of years ago (www.tranzmatic.com). The data are contained in two parallel text files (one word per line). When a user makes a query, the perl script searches the respective file (German or English) for a match. If a match is found, it opens the other file and returns the entry on the same line number as the match. The p... See more
As an exercise to learn perl, I created a smallish German-English online dictionary a number of years ago (www.tranzmatic.com). The data are contained in two parallel text files (one word per line). When a user makes a query, the perl script searches the respective file (German or English) for a match. If a match is found, it opens the other file and returns the entry on the same line number as the match. The perl script then redraws the page.

When I created the dictionary in 1999, php wasn't standard on a lot of hosting services, so I went with what was readily available. I'm sure you could do a lot more with php or another database-driven approach available today. But even with text files and perl, you could extend this idea to include some context (use a comma delimited text file to add any context you might have after a word). Or use XML - then you can add all kinds of information about a word (gender, part of speech, examples).

Shoot me a note if you'd like a copy of the script.

Chris
Collapse


 
Stanislaw Czech, MCIL CL
Stanislaw Czech, MCIL CL  Identity Verified
United Kingdom
Local time: 23:05
Member (2006)
English to Polish
+ ...
SITE LOCALIZER
Relatively easy Sep 8, 2010

If you are not an PHP/HTML expert (judging by your post it is the case) I would suggest a ready solution. You could use a free CMS Joomla there are a few ready extensions which could suite your purpose some of them free.

You can have a look at extensions here: http://extensions.joomla.org/extensions/living/education-a-culture/glossary

H
... See more
If you are not an PHP/HTML expert (judging by your post it is the case) I would suggest a ready solution. You could use a free CMS Joomla there are a few ready extensions which could suite your purpose some of them free.

You can have a look at extensions here: http://extensions.joomla.org/extensions/living/education-a-culture/glossary

Here you can see how it works, on my page: http://polish-translation.biz/glossary-of-legal-and-business-terms.html

Best Regards
Stanislaw
Collapse


 
mediamatrix (X)
mediamatrix (X)
Local time: 18:05
Spanish to English
+ ...
DIY - feasible and worthwhile Sep 8, 2010

It’s difficult to be specific without knowing more about your existing data and your ambitions, and what options you have on your web-server. But if you are keen to get your hands dirty and take your IT skills a bit beyond html coding, building a simple bi-lingual on-line glossary is a worthwhile hands-on, learn-as-you-go project.

As John has mentioned, you will need a server-side application – most likely PHP or ASP. If you are familiar with Visual Basic for Applications (MS
... See more
It’s difficult to be specific without knowing more about your existing data and your ambitions, and what options you have on your web-server. But if you are keen to get your hands dirty and take your IT skills a bit beyond html coding, building a simple bi-lingual on-line glossary is a worthwhile hands-on, learn-as-you-go project.

As John has mentioned, you will need a server-side application – most likely PHP or ASP. If you are familiar with Visual Basic for Applications (MS Office macros, etc.) then you might find ASP easier than PHP, but ASP requires a Windows server so if you’re on Linux your choice is more limited.

The conversion of your tab-limited text files to make them ready to dump into your on-line database should be fairly easy. First, you need to decide how you’re going to structure the on-line database (in mySQL, for example) so you know what table structure you are aiming for when you convert the tab-delimited data. For starters, you can convert the tabbed text into tables in Word and import the tables into Excel or Access.

When you have the data in Excel or Access, structured in a manner similar to what you want to end up with in the on-line database, then I could lend you a home-brew Windows program I wrote a little while ago to build equivalent mySQL tables and convert your Excel/Access data to mySQL format for up-load to your server.

Item three on your ‘to do’ list is building the user interface – at its most simple a dedicated HTML page with a half a dozen lines of program code embedded in it – something like the PHP script you showed us earlier, or the equivalent in ASP.

Then, when you start getting correct responses from your terminology queries you will no doubt think up lots of ways to enhance your start-up project. And one of the key advantages of home-brew databases is that you can adapt and perfect them to suit your requirements to an extent that no 'one-size-fits-all' freeware or $$$ware will allow. The only risk you need to be aware of is that you may get to the point where IT takes over completely from translation

MediaMatrix


Edited to add: If this glossary is intended to be a part of the website linked from your Proz.com profile, you should perhaps be thinking in terms of also using mySQL for your resources directory (catalogue of on-line glossaries, etc.).

[Edited at 2010-09-08 12:08 GMT]
Collapse


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 23:05
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
Thanks everyone! Sep 8, 2010

Wow ... thanks.

As usual, I just got a huge job so I will have to get back to this on Monday ... but it definitely looks doable. I am very much looking forward to starting.

I think I am going to try it the hard(er) way. Christopher, your http://www.tranzmatic.com/ looks cool. That's basically exactly what I have in mind. I want to ma
... See more
Wow ... thanks.

As usual, I just got a huge job so I will have to get back to this on Monday ... but it definitely looks doable. I am very much looking forward to starting.

I think I am going to try it the hard(er) way. Christopher, your http://www.tranzmatic.com/ looks cool. That's basically exactly what I have in mind. I want to make this domain: http://woordenboek.mx/ (which is still just a page of info about various Dutch/English translator's resources) into kind of a more open source version of TecDic. Where the source files can be downloaded in CSV or sth, and people can suggest terms etc.

Mediamatrix, I might be asking to borrow that home-made conversion program you mentioned. It sounds great!

Hmm. Perl or PHP... But first, I think I'd better get this (translation) job done;)

Michael
Collapse


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 23:05
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
Reverse engineering TecDic Sep 8, 2010

Hmm, seems like http://www.tecdic.com is using MySQL and php. I tried to use the "Add a new translation to TecDic" feature and I was presented with this error message:

Warning: mysql_connect() [function.mysql-connect]: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) in /home/sites/www.tecdic.com/web/thankyou.php on line 3

Warning: mysql_select_db():
... See more
Hmm, seems like http://www.tecdic.com is using MySQL and php. I tried to use the "Add a new translation to TecDic" feature and I was presented with this error message:

Warning: mysql_connect() [function.mysql-connect]: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) in /home/sites/www.tecdic.com/web/thankyou.php on line 3

Warning: mysql_select_db(): supplied argument is not a valid MySQL-Link resource in /home/sites/www.tecdic.com/web/thankyou.php on line 4

Warning: mysql_query(): supplied argument is not a valid MySQL-Link resource in /home/sites/www.tecdic.com/web/thankyou.php on line 6

Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result resource in /home/sites/www.tecdic.com/web/thankyou.php on line 6


Interesting.

Michael


[Edited at 2010-09-08 22:14 GMT]
Collapse


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 00:05
English to Hungarian
+ ...
one tidbit Sep 8, 2010

mediamatrix wrote:

The conversion of your tab-limited text files to make them ready to dump into your on-line database should be fairly easy. First, you need to decide how you’re going to structure the on-line database (in mySQL, for example) so you know what table structure you are aiming for when you convert the tab-delimited data. For starters, you can convert the tabbed text into tables in Word and import the tables into Excel or Access.


The first half is probably good advice, the rest doesn't sound very convincing to me.
Going via Word is just an unnecessary complication, frankly.
If you need to get data into Excel for some reason, it can open tab separated files straight from the txt and then save them as xls. If you need to automate xls generation, there's also a really good perl module that can generate an xls out of pretty much anything. I was just playing with it today, I have a script that will generate an xls out of a 2 to 4 column txt. Pretty handy if your files are large: it can handle up to 200,000 records, splitting them into as many worksheets as needed. Incidentally, Excel works reasonably well for this sort of stuff: it can handle a lot more text than you could view in Word or a text editor.
Obviously, there are perl modules and other advanced automated solutions for populating databases the same way, but I don't know the particulars. If you're using perl, this seems like the place to start: http://search.cpan.org/~timb/DBI-1.613/DBI.pm
http://dbi.perl.org/

[Edited at 2010-09-08 20:40 GMT]


 
mediamatrix (X)
mediamatrix (X)
Local time: 18:05
Spanish to English
+ ...
Personal prejudices Sep 9, 2010

FarkasAndras wrote:
The first half is probably good advice, the rest doesn't sound very convincing to me.
Going via Word is just an unnecessary complication, frankly.


In my (not inconsiderable) experience of getting multi-sourced data into mySQL, pre-digesting text data in a text environment is 'almost always' the best place to start. The only exceptions would be where your text data has come undiluted and unpolluted from a proven text-base in the first place. Even if your data is, supposedly, a perfectly ordinary, well configured, tab-delimited text file, you would be amazed how often it will get mangled if you try to dump it 'unseen' into a database (or into Excel, for that matter). It does no harm - and can give valuable prior warning of potential data structure inconsistencies, right up front - to put the tab-delimited data through Word as a first step.

FarkasAndras wrote:
If you need to get data into Excel for some reason, it can open tab separated files straight from the txt ..


Yes, it can. Subject to the earlier proviso regarding impeccable text formatting... And if the formatting is haywire, you're better off fixing it in a proper text environment.

FarkasAndras wrote:
... and then save them as xls.


XLS? - Who wants or needs XLS ? Michael certainly doesn't!


If you need to automate xls generation, there's also a really good perl module that can generate an xls out of pretty much anything. I was just playing with it today, I have a script that will generate an xls out of a 2 to 4 column txt. Pretty handy if your files are large: it can handle up to 200,000 records, splitting them into as many worksheets as needed. Incidentally, Excel works reasonably well for this sort of stuff: it can handle a lot more text than you could view in Word or a text editor.


Harking back to the title to this post, this is where prejudice comes in. Personally, I would never, ever use Excel to process any text data - let alone a glossary. It's only value in the kind of workflow envisaged by Michael is as an intermedate format (if needed) between the raw data and the web-server; again, being strongly prejudiced against the use of Excel for text processing, I would use Access instead.

At the end of the day, let's not forget that the final destination of this data is the web, not Michael's laptop.

MediaMatrix


 
Neil Coffey
Neil Coffey  Identity Verified
United Kingdom
Local time: 23:05
French to English
+ ...
Things to think about... Sep 9, 2010

If you're interested in programming, then what you're proposing will be a rewarding project. The only slight caveat I would suggest is that running a production server isn't quite the same as knocking up a high school project in MS Access. And if you just rely on the Acme Guide to Building a Dictionary Web Site in PHP without really knowing what you're doing, then you may run into security problems, performance problems which cause your hosting company to suspend your site, etc etc.

... See more
If you're interested in programming, then what you're proposing will be a rewarding project. The only slight caveat I would suggest is that running a production server isn't quite the same as knocking up a high school project in MS Access. And if you just rely on the Acme Guide to Building a Dictionary Web Site in PHP without really knowing what you're doing, then you may run into security problems, performance problems which cause your hosting company to suspend your site, etc etc.

So, that said:

- as an initial *rough* version, you could indeed go some way with your data in .txt files and something knocked up in something like Perl/PHP;
- if you do have your data in text files, you probably need to have lots of small text files, not one large one; e.g. you can have a file ABA.txt, ABE.txt etc, based on the first three letters of the headword -- with the restriction that you can only really index on one thing this way (once you start wanting to index on multiple things, you may as well use a database IMO); this is potentially quite an acceptable way of doing things -- after all, the filing system is an indexing/caching system at the end of the day -- but you do need to be careful;
- whatever server-side language you choose, you need to read up on how to code web applications *securely* in that language -- scripting languages like Perl and PHP are notorious for resulting in applications that are full of security holes;
- if you use a database, again you need to read up on security (just as an example, the PHP/MySQL code you posted is vulnerable to a potentially very dangerous type of attack called a "SQL injection attack"-- it may be OK for a high school project but you should NOT use it like that in production);
- you need to be slightly careful about choosing a hosting package -- obviously you probably need server-side programming/scripting facilities of some kind (PHP, Perl, Ruby etc), but you also need to check what their fair use/resource allocation policy is -- a lot of the "build your web site for $1/month" type packages are really only designed for people building web sites for their dog with 10 hits per year, and any vaguely serious amount of traffic will push you over the limit -- e.g. you may find they're literally sharing 100 sites on one server and your limit is less than 1% of the machine's resources. Maybe for Version 1.0 you don't care about this, but if you're planning to grow your web site, it's something to consider -- moving and rewriting everything for a new hosting provider can be a time consuming process.

The other thing to consider is: how much actual dictionary data do you currently have? If it's just a few pages' worth, then for Version 0.01, you could consider just putting together some static HTML pages (you may want to program something to turn yout txt files into HTML), see how much traffic you actually get over the next 6 months, and then consider how much development is worthwhile.

If you don't mind all clients getting a complete copy of your data, you could potetially do it all client side with something like GWT (Google Windows Toolkit), which essentially lets you write in Java and converts to Javascript. (If you're masochistic, you could write it in "raw" Javascript.)

Sorry, I know I haven't given a definitive "do this" answer -- but that's partly because I think you need to be clear about what the objectives of the project are, and how much time you're willing to put in.

[Edited at 2010-09-09 01:09 GMT]
Collapse


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 00:05
English to Hungarian
+ ...
Responses Sep 9, 2010

Well, there isn't a lot I can agree with in what you posted.

mediamatrix wrote:

In my (not inconsiderable) experience of getting multi-sourced data into mySQL, pre-digesting text data in a text environment is 'almost always' the best place to start. The only exceptions would be where your text data has come undiluted and unpolluted from a proven text-base in the first place. Even if your data is, supposedly, a perfectly ordinary, well configured, tab-delimited text file, you would be amazed how often it will get mangled if you try to dump it 'unseen' into a database (or into Excel, for that matter). It does no harm - and can give valuable prior warning of potential data structure inconsistencies, right up front - to put the tab-delimited data through Word as a first step.

I have no idea why you would ever want to do anything like reviewing a table in Word. If you want to do checks/corrections before importing to SQL or whatever the final format will be, it's quite obvious that a spreadsheet program is the best choice. In Word, how do you shift a column one row up or down? How do you check for duplicates? How do you reorder, insert or delete columns or rows? How do you merge cells or columns? How do you swich cells B342 to B450 with C342 to C450 if someone got their languages mixed up? How do you sort the table alphabetically? How do you check for empty fields? All these common tasks are basically single-click actions in Excel or OOo Calc, while they would take a heroic effort in Word. Mass replacements are also a lot faster in Excel, and it can handle a lot more data a lot faster. Try doing 100,000 replacements in a table with 50,000 records in Word... not fun.
On the other hand, there is not a lot in this regard that Word is better at than Excel. All I can think of is things like removing extra spaces from the beginning/end of each cell or splitting cells based on a pattern using search and replace, and that sort of fancy stuff can only be done in Word without converting to a table.
If you just read the data and correct typos, do spell checks and similar trivial things, Excel works much the same as Word.

mediamatrix wrote:
FarkasAndras wrote:
If you need to get data into Excel for some reason, it can open tab separated files straight from the txt ..


Yes, it can. Subject to the earlier proviso regarding impeccable text formatting... And if the formatting is haywire, you're better off fixing it in a proper text environment.

Well, as explained above, IMO these fixes are a lot easier to do in Excel.
mediamatrix wrote:
FarkasAndras wrote:
... and then save them as xls.


XLS? - Who wants or needs XLS ? Michael certainly doesn't!

Comments on your style and telepathic powers aside, you yourself brought up Excel first. I don't see how or why you'd want to get your data in Excel and then not save it in xls, considering that you don't seem to see Excel as useful in editing glossaries.
If you really want an example, automatic batch conversion of tabbed txt or database files into neatly formatted xls files could be very useful for providing easy-to-use downloadable files for the less tech savvy users of the site. You could maintain and update your data in whatever format you choose (tab delimited, csv, xml, database) and generate dozens or hundreds of new up-to-date downloadable xls files with a single click.

mediamatrix wrote:
Personally, I would never, ever use Excel to process any text data - let alone a glossary.

Indeed, that's your personal preference. Excel is not that great for text in general, but IMO it's pretty great for tables like glossaries.


 
mediamatrix (X)
mediamatrix (X)
Local time: 18:05
Spanish to English
+ ...
I never for a moment imagined ... Sep 9, 2010

FarkasAndras wrote:
Well, there isn't a lot I can agree with in what you posted.


.. that you would agree with any of it

You know as well as I that there are a dozen or more ways to approach most data-processing tasks, and that the most appropriate choice of method in any particular case is much more dependent on personal preference/prejudice than on the particular technical merits of the tools used to get the job done.

If nothing else, this exchange will have demonstrated to Michael that he needs to carefully analyse his raw data, carefully assess the tools at his disposal (on both his computer and his web-server), and carefully plan how he wants to structure it in the web-base. Then, how he gets the data from A to B - tidying it up along the way if necessary - will depend primarily on what tools he is most comfortable with.

MediaMatrix


 
I would recommend Drupal CMS to start your online ditionary Jan 28, 2013

It is very flexible and powerfull CMS System that lets you build any powerfull web application without coding. I am already at the point of building something similar.

 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 23:05
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
@wojtekm: Jan 28, 2013

Hi wojtekm,

Could you perhaps provide us with an example of what you are working on (links?), or a little more information? I'd be curious to hear more about Drupal and the creation of online dictionaries/glossaries.

Michael


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Laureana Pavon[Call to this topic]

You can also contact site staff by submitting a support request »

How to create an online dictionary?






Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »