Jump to content

Welcome to Windows Forum - Computer Support Forums
Register now to gain access to all of our features. Once registered and logged in, you will be able to create topics, post replies to existing threads, give reputation to your fellow members, get your own private messenger, post status updates, manage your profile and so much more. If you already have an account, login here - otherwise create an account for free today!
Photo

The Book Scanning Project

- - - - -

  • Please log in to reply
29 replies to this topic

#21
ɹəuəllıʍ ʇɐb

ɹəuəllıʍ ʇɐb

    ドラえもん

  • Forum Moderator
  • 10,541 posts
  • Gender:Male
  • Location:Tōkyō
  • OS Architecture:64 Bit (x64)
Interesting links, indeed - thank you, John!

Posted Image Posted Image

Doraemon - the robot cat from the future.


#22
ɹəuəllıʍ ʇɐb

ɹəuəllıʍ ʇɐb

    ドラえもん

  • Forum Moderator
  • 10,541 posts
  • Gender:Male
  • Location:Tōkyō
  • OS Architecture:64 Bit (x64)
Now that we have some 10,000 books in the database, I find that a lot of them (30% - 40%) do not have a barcode. While entering the book, this is not a big deal - the ISBN can be typed manually. Books that were printed before 1970 do not even have an ISBN, and all data are entered manually.

However, when selling these books at the counter, they must be eliminated from the database. Books with a barcode are done within a second (with the scanner), but books without must be noted down manually. This is very awkward when somebody buys 10 or 15 books...

So I have decided that we must print our own labels with barcodes on it:
  • the ISBN if available
  • or our own internal number (database key) if there is no ISBN
Examples:
Attached File  GDB-ISBN-label.png   2.61KB   2 downloads Attached File  GDB-bookID-label.png   2.17KB   2 downloads

Extracting the text data on top from the database is easy enough, but I have found that printing a barcode is not as straightforward as printing text:
  • you need a barcode font
  • you need to encode the number to make the barcode scannable

It took me a while to find a free barcode font (it requires a EAN13 barcode font), but I found it eventually at http://sourceforge.n...barcodes/files/

Encoding the ISBN or our internal numbers took more research; the basic information are at http://en.wikipedia.org/wiki/EAN-13, with additional info at http://www.barcodeis...com/ean13.phtml and http://www.barcodeis....com/ean8.phtml

So at the end all I needed to do is to write some small JavaScript functions to encode those numbers. (The ISBN already contains the required check digit, but our internal number needs to calculate a check digit.)
function EAN13(ean)
{
var L=['A','B','C','D','E','F','G','H','I','J'];
var G=['K','L','M','N','O','P','Q','R','S','T'];
var R=['a','b','c','d','e','f','g','h','i','j'];
var n=0;
var s='9';

for (var i = 1; i < 13; i++)
	switch (i)
	{
		case 1:
		case 4:
		case 6:
			n = ean.charAt(i) - 0;
			s += L[n];
			break;
		case 2:
		case 3:
		case 5:
			n = ean.charAt(i) - 0;
			s += G[n];
			break;
		case 7:
			s += '*';
		default:
			n = ean.charAt(i) - 0;
			s += R[n];
			break;
	}
	s += '+';
	return s;
}

function EAN8(book_id)
{
var L=['A','B','C','D','E','F','G','H','I','J'];
var R=['a','b','c','d','e','f','g','h','i','j'];
var n=0;
var s=':';
var ean8 = lpad(book_id+'',7);

ean8 += EAN8cd(ean8);

for (var i = 0; i < 8; i++)
	switch (i)
	{
		case 0:
		case 1:
		case 2:
		case 3:
			n = ean8.charAt(i) - 0;
			s += L[n];
			break;
		case 4:
			s += '*';
		default:
			n = ean8.charAt(i) - 0;
			s += R[n];
			break;
	}
	s += '+';
	return s;
}

function EAN8cd(ean)
{
var checkDigit = 10 -  ((
	3 * ean.charAt(0) +
	1 * ean.charAt(1) +
	3 * ean.charAt(2) +
	1 * ean.charAt(3) +
	3 * ean.charAt(4) +
	1 * ean.charAt(5) +
	3 * ean.charAt(6)) % 10);
 
	if (checkDigit == 10)
		return '0';
	else
		 return checkDigit+'';
}

function lpad(number, length)
{
    var str = '' + number;
    while (str.length < length)
        str = '0' + str;
   
    return str;
}


#23
ɹəuəllıʍ ʇɐb

ɹəuəllıʍ ʇɐb

    ドラえもん

  • Forum Moderator
  • 10,541 posts
  • Gender:Male
  • Location:Tōkyō
  • OS Architecture:64 Bit (x64)
I have made some ISBN tools, and added them to my public website: http://www2.gol.com/...bw/makeisbn.htm

The tools are for now
  • make an ISBN for books that don't have any
  • convert between 10-digit ISBN and 13-digit EAN

Posted Image Posted Image

Doraemon - the robot cat from the future.


#24
ɹəuəllıʍ ʇɐb

ɹəuəllıʍ ʇɐb

    ドラえもん

  • Forum Moderator
  • 10,541 posts
  • Gender:Male
  • Location:Tōkyō
  • OS Architecture:64 Bit (x64)
There is constantly work to be done on the project. All books are in the database now, but almost on a daily basis we found that some books are in the database, but we cannot actually find the book where it is supposed to be! So it's time to conduct an inventory; this will also help us to correct titles and author names, as well as identify books that still don't have a barcode label.

Writing database entries out to an Excel document is easy enough, but I encountered a strange problem: all accented characters were somehow transformed into some strange Chinese characters, e.g. John Le Carré ended up as John Le Carr鼯

I tried and tried, but I was unable to prevent that. I also searched the Internet, but it seems nobody has encountered this problem before.

I finally found that using the html entity equivalent (e.g. &auml; for 'ä') will solve my problem. So I wrote a little Python function (Python being much easier than JavaScript) that transforms all accented characters to their html enitity equivalent. Here is the function (instr is the input argument):
s = instr
s = s.replace('À','À')	# 2nd param should be Agrave, surrounded by & and ;
s = s.replace('Á','Á')	# 2nd param should be Aacute, surrounded by & and ;
s = s.replace('Â','Â')	# 2nd param should be Acirc, surrounded by & and ;
s = s.replace('Ã','Ã')	# 2nd param should be Atilde, surrounded by & and ;
s = s.replace('Ä','&Auml;')
s = s.replace('Å','Å')	# 2nd param should be Aring, surrounded by & and ;
s = s.replace('Æ','Æ')	# 2nd param should be AElig, surrounded by & and ;
s = s.replace('Ç','Ç')	# 2nd param should be Ccedil, surrounded by & and ;
s = s.replace('È','È')	# 2nd param should be Egrave, surrounded by & and ;
s = s.replace('É','É')	# 2nd param should be Eacute, surrounded by & and ;
s = s.replace('Ê','Ê')	# 2nd param should be Ecirc, surrounded by & and ;
s = s.replace('Ë','&Euml;')
s = s.replace('Ì','Ì')	# 2nd param should be Igrave, surrounded by & and ;
s = s.replace('Í','Í')	# 2nd param should be Iacute, surrounded by & and ;
s = s.replace('Î','Î')	# 2nd param should be Icirc, surrounded by & and ;
s = s.replace('Ï','&Iuml;')
s = s.replace('Ñ','Ñ')	# 2nd param should be Ntilde, surrounded by & and ;
s = s.replace('Ò','Ò')	# 2nd param should be Ograve, surrounded by & and ;
s = s.replace('Ó','Ó')	# 2nd param should be Oacute, surrounded by & and ;
s = s.replace('Ô','Ô')	# 2nd param should be Ocirc, surrounded by & and ;
s = s.replace('Õ','Õ')	# 2nd param should be Otilde, surrounded by & and ;
s = s.replace('Ö','&Ouml;')
s = s.replace('Ø','Ø')	# 2nd param should be Oslash, surrounded by & and ;
s = s.replace('Ù','Ù')	# 2nd param should be Ugrave, surrounded by & and ;
s = s.replace('Ú','Ú')	# 2nd param should be Uacute, surrounded by & and ;
s = s.replace('Û','Û')	# 2nd param should be Ucirc, surrounded by & and ;
s = s.replace('Ü','&Uuml;')
s = s.replace('Ý','Ý')	# 2nd param should be Yacute, surrounded by & and ;
s = s.replace('ß','ß')	# 2nd param should be szlig, surrounded by & and ;
s = s.replace('à','à')	# 2nd param should be agrave, surrounded by & and ;
s = s.replace('á','á')	# 2nd param should be aacute, surrounded by & and ;
s = s.replace('â','â')	# 2nd param should be acirc, surrounded by & and ;
s = s.replace('ã','ã')	# 2nd param should be atilde, surrounded by & and ;
s = s.replace('ä','&auml;')
s = s.replace('å','å')	# 2nd param should be aring, surrounded by & and ;
s = s.replace('æ','æ')	# 2nd param should be aelig, surrounded by & and ;
s = s.replace('ç','ç')	# 2nd param should be ccedil, surrounded by & and ;
s = s.replace('è','è')	# 2nd param should be egrave, surrounded by & and ;
s = s.replace('é','é')	# 2nd param should be eacute, surrounded by & and ;
s = s.replace('ê','ê')	# 2nd param should be ecirc, surrounded by & and ;
s = s.replace('ë','&euml;')
s = s.replace('ì','ì')	# 2nd param should be igrave, surrounded by & and ;
s = s.replace('í','í')	# 2nd param should be iacute, surrounded by & and ;
s = s.replace('î','î')	# 2nd param should be icirc, surrounded by & and ;
s = s.replace('ï','&iuml;')
s = s.replace('ñ','ñ')	# 2nd param should be ntilde, surrounded by & and ;
s = s.replace('ò','ò')	# 2nd param should be ograve, surrounded by & and ;
s = s.replace('ó','ó')	# 2nd param should be oacute, surrounded by & and ;
s = s.replace('ô','ô')	# 2nd param should be ocirc, surrounded by & and ;
s = s.replace('õ','õ')	# 2nd param should be otilde, surrounded by & and ;
s = s.replace('ö','&ouml;')
s = s.replace('ø','ø')	# 2nd param should be oslash, surrounded by & and ;
s = s.replace('ù','ù')	# 2nd param should be ugrave, surrounded by & and ;
s = s.replace('ú','ú')	# 2nd param should be uacute, surrounded by & and ;
s = s.replace('û','û')	# 2nd param should be ucirc, surrounded by & and ;
s = s.replace('ü','&uuml;')
s = s.replace('ý','ý')	# 2nd param should be yacute, surrounded by & and ;
s = s.replace('ÿ','&yuml;')	# 2nd param should be yuml, surrounded by & and ;
return s
The forum editor keeps changing the entities above, so I have corrected it with some comments at the end of the lines.

Edited by ɹəuəllıʍ ʇɐb, 11 July 2012 - 08:48.
: bloody forum editor!


#25
MANEMAN

MANEMAN

    Former Nano Technologist & Non-Resident eedjut

  • Member
  • PipPipPip
  • 2,085 posts
  • Gender:Male
  • Location:CORNWALL. U.K.
Sounds as though you are having fun Pat. :)
Give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a lifetime.

It is better sometimes to point a man towards the pathway to a goal rather than the goal itself.
The pathway is where he will learn, - and remember.

#26
ɹəuəllıʍ ʇɐb

ɹəuəllıʍ ʇɐb

    ドラえもん

  • Forum Moderator
  • 10,541 posts
  • Gender:Male
  • Location:Tōkyō
  • OS Architecture:64 Bit (x64)
The project is of course constantly evolving, but not of enough interest to document it here. However, I did find something very interesting yesterday...

For various reasons (one of them is that the shop has moved to another location) I need to update a large number of books. To bring up the book details I need to enter the book_id as the key.

This means that I need to use the mouse each time to click on the key input field; I found that very annoying!

Yesterday, by pure chance, I found a HTML textarea attribute accesskey - with this I can specify a single character that will act as a keyboard shortcut to jump to that input field. So in my case I coded
<label for="search_book_id">
<span style="color:#330066; font-weight: bold;">Book ID: </span>
<input type="text"
	id="search_book_id"
	name="search_book_id"
	class="input"
	size="6"
	style="margin-left:3px;"
	autocomplete="off"
	accesskey="I"
	title="go here with Alt+Shift+I" />
</label>
Now when I press Alt+Shift+I the cursor will jump right into my key input field :happy:

#27
doug

doug

    WF Moderator

  • Forum Moderator
  • 7,618 posts
  • Gender:Male
  • Location:Washington, Tyne & Wear
Pat one day you will tell us that's it's finished and you've read every one!

Have come to my senses and realised I'm not doing the Great North Run any more. An age related decision!


#28
ɹəuəllıʍ ʇɐb

ɹəuəllıʍ ʇɐb

    ドラえもん

  • Forum Moderator
  • 10,541 posts
  • Gender:Male
  • Location:Tōkyō
  • OS Architecture:64 Bit (x64)
That's the beauty of it; it will never finish as long as the shop exists! All books are in the database, but there are constantly items going out and coming in.

Trying to read all the books in the shop is a project about the same scale like trying all the restaurants in Tokyo...

Posted Image Posted Image

Doraemon - the robot cat from the future.


#29
Chris81042

Chris81042

    Newbie

  • Member
  • Pip
  • 1 posts
  • Gender:Male
  • Location:USA
Nice FORUM Is This.

Edited by Chris81042, 17 September 2012 - 11:58.


#30
ɹəuəllıʍ ʇɐb

ɹəuəllıʍ ʇɐb

    ドラえもん

  • Forum Moderator
  • 10,541 posts
  • Gender:Male
  • Location:Tōkyō
  • OS Architecture:64 Bit (x64)
I have updated and expanded my script from post #6; it is now capable to run under UTF-8 encoding.
function handle_accent(instr)
{
    console.log(instr);
    console.log(toHex(instr));
    var r = instr;
    r = r.replace(new RegExp(/[\u00C0\u00C1\u00C2\u00C3\u00C4\u00C5]/g),'A');
    r = r.replace(new RegExp(/[\u00C6]/g),'AE');
    r = r.replace(new RegExp(/[\u00C7]/g),'C');
    r = r.replace(new RegExp(/[\u00C8\u00C9\u00CA\u00CB]/g),'E');
    r = r.replace(new RegExp(/[\u00CC\u00CD\u00CE\u00CF]/g),'I');
    r = r.replace(new RegExp(/[\u00D1]/g),'N');
    r = r.replace(new RegExp(/[\u00D2\u00D3\u00D4\u00D5\u00D6\u00D8]/g),'O');
    r = r.replace(new RegExp(/[\u00D9\u00DA\u00DB\u00DC]/g),'U');
    r = r.replace(new RegExp(/[\u00DD]/g),'Y');
    r = r.replace(new RegExp(/[\u00DF]/g),'ss');
    r = r.replace(new RegExp(/[\u00E0\u00E1\u00E2\u00E3\u00E4\u00E5]/g),'a');
    r = r.replace(new RegExp(/[\u00E6]/g),'ae');
    r = r.replace(new RegExp(/[\u00E7]/g),'c');
    r = r.replace(new RegExp(/[\u00E8\u00E9\u00EA\u00EB]/g),'e');
    r = r.replace(new RegExp(/[\u00EC\u00ED\u00EE\u00EF]/g),'i');
    r = r.replace(new RegExp(/[\u00F1]/g),'n');
    r = r.replace(new RegExp(/[\u00F2\u00F3\u00F4\u00F5\u00F6\u00F8]/g),'o');
    r = r.replace(new RegExp(/[\u00F9\u00FA\u00FB\u00FC]/g),'u');
    r = r.replace(new RegExp(/[\u00FD\u00FF]/g),'y');
    r = r.replace(new RegExp(/[\u0100\u0102]/g),'A'); // A w/macron, breve
    r = r.replace(new RegExp(/[\u0101\u0103]/g),'a'); // a w/macron, breve
    r = r.replace(new RegExp(/[\u0106\u0108\u010A\u010C]/g),'C'); // C w/acute, circumflex, dot, breve
    r = r.replace(new RegExp(/[\u0107\u0109\u010B\u010D]/g),'c'); // c w/acute, circumflex, dot, breve
    r = r.replace(new RegExp(/[\u0110]/g),'D'); // D w/stroke
    r = r.replace(new RegExp(/[\u0111]/g),'d'); // d w/stroke
    r = r.replace(new RegExp(/[\u0112\u0116\u011A]/g),'E'); // E w/macron, dot, caron
    r = r.replace(new RegExp(/[\u0113\u0117\u011B]/g),'e'); // e w/macron, dot, caron
    r = r.replace(new RegExp(/[\u011C\u011E\u0120]/g),'G'); // G w/circumflex, breve, dot
    r = r.replace(new RegExp(/[\u011D\u011F\u0121]/g),'g'); // g w/circumflex, breve, dot
    r = r.replace(new RegExp(/[\u0128\u012A\u012C\u0130]/g),'I'); // I w/tilde, macron, breve, dot
    r = r.replace(new RegExp(/[\u0129\u012B\u012D\u0131]/g),'i'); // i w/tilde, macron, breve, dotless
    r = r.replace(new RegExp(/[\u0141]/g),'L'); // L w/stroke
    r = r.replace(new RegExp(/[\u0142]/g),'l'); // l w/stroke
    r = r.replace(new RegExp(/[\u0143\u0145\u0147]/g),'N'); // N w/acute, cedilla, caron
    r = r.replace(new RegExp(/[\u0144\u0146\u0148]/g),'n'); // n w/acute, cedilla, caron
    r = r.replace(new RegExp(/[\u014C\u014E\u0150]/g),'O'); // O w/macron, breve, double acute
    r = r.replace(new RegExp(/[\u014D\u014F\u0151]/g),'o'); // o w/macron, breve, double acute
    r = r.replace(new RegExp(/[\u0152]/g),'OE');
    r = r.replace(new RegExp(/[\u0153]/g),'oe');
    r = r.replace(new RegExp(/[\u0154\u0156\u0158]/g),'R'); // R w/acute, cedilla, caron
    r = r.replace(new RegExp(/[\u0155\u0157\u0159]/g),'r'); // r w/acute, cedilla, caron
    r = r.replace(new RegExp(/[\u015A\u015C\u015E\u0160]/g),'S'); // S w/acute, circumflex, cedilla, caron
    r = r.replace(new RegExp(/[\u015B\u015D\u015F\u0161]/g),'s'); // s w/acute, circumflex, cedilla, caron
    r = r.replace(new RegExp(/[\u0168\u016A\u016C\u0170]/g),'U'); // U w/tilde, macron, breve, double acute
    r = r.replace(new RegExp(/[\u0169\u016B\u016D\u0171]/g),'u'); // u w/tilde, macron, breve, double acute
    r = r.replace(new RegExp(/[\u0174]/g),'W'); // W w/circumflex
    r = r.replace(new RegExp(/[\u0175]/g),'w'); // w w/circumflex
    r = r.replace(new RegExp(/[\u0176\u0178]/g),'Y'); // Y w/circumflex, diaeresis
    r = r.replace(new RegExp(/[\u0177]/g),'y'); // y w/circumflex
    r = r.replace(new RegExp(/[\u0179\u017B\u017D]/g),'Z'); // Z w/acute, dot, caron
    r = r.replace(new RegExp(/[\u017A\u017C\u017E]/g),'z'); // z w/acute, dot, caron
    if (r == instr)
	    return '';
    else
	    return r;
}

Posted Image Posted Image

Doraemon - the robot cat from the future.





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users