The Hurdle in Unicode Adoption in Indian Languages – Print and Unicode Fonts

There is a dearth of quality User Generated Content in Indian Languages on Internet. There is User Generated Content, primarily created by techie enthusiasts. The newspapers have now come online. Some of them have actually moved to Unicode text, but many of them are still just converting their print papers to e-papers! Not a great way to putting discoverable content on Internet.

The content that is actually consumed by the Indian Language Market is mostly created for Print. Books, Newspapers etc. And the Unicode adoption there is low. There are historical reasons for that of course. Desktop Publishing Technologies came to publishing before Unicode had become prevalent. So, people in the industry got used to the non-standard fonts. Now if you move to Unicode, all the typists and the editors who work on typed content need to relearn the typing. Obviously, there is resistance. Plus many of the Desktop Publishing Software, still do not support the Complex Text Layout (CTL) needed for displaying Indian Languages correctly (see the previous post for understanding Complex Text Layout).

But there is another issue too. It is difficult to sell the idea to even the people higher up (who could possibly mandate the staff to relearn things and sanction purchase of right software), because good Unicode compatible fonts are not available. Speaking of Hindi, Mangal and Arial Unicode MS are two fonts available on Windows machine. Mangal just does not look good in print. Arial Unicode is slightly better, but people want more choices, while designing. A lot of Unicode fonts available from CDAC and other sites are downright ugly.

I am not sure of why better fonts are not coming to the market. Is it some kind of chicken and egg situation? The users would resist the change and hence not asking for the Unicode fonts. Companies making fonts, therefore, do not feel there is a market for Unicode Fonts. And there probably aren’t those techies around who understand both font design and Indian Languages well enough to create some beautiful, free fonts for Indian Languages.

Print adopting Unicode is very important for meaningful Unicode adoption for Indian Languages. Where do we start?

Indian Language typing on Computer

Characters and Fonts

Before getting into Indian Language typing, let’s understand how does the computer understand what we type. We will understand this with the help of examples from English typing, since that is the language best handled by and understood on computers.

Computer understands individual characters. All letters, numbers, symbols are a character to the computer.  ‘A’ is a character, ‘a’ is another character, “,” (comma) is another character and so on.

Then there are fonts, which tell the computer how to display a particular character. So, the same set of characters “Pothi.com” will appear different in different fonts.

Pothi In Different FontsAs you can see, “P” is displayed by the font “Times New Roman” is different from the “P” displayed by the font “Arial”, which in turn is different from the “P” displayed by the font “Monotype Corsiva”. Similar is the case with other characters like “o”, “t” etc.

What is important here is that computer still knows that a “P” is a “P”, irrespective of the font it is displayed in. That’s why when you use the “Find” or “Search” feature while typing, it will find you the word/character you searched for, irrespective of how the font displays it. In fact, the underlying system that recognizes what character it is, does not care at all as to how the font displays it. In the following image, the first line in “Pothi.com” in a font called “MT Extra”. The second line is the name of the font!

Pothi In MT ExtraAs you can see, the display makes no sense for an English reading human. But the computer does not care.

What this means is, that you can design a font, that displays certain English characters as certain letter in one of the Indian languages. We’ll take Hindi as an example.

The same set of characters “Pothi.com” in a font called “Kruti Dev 010” become the following

Pothi In Kruti DevOf course, it does not look anything like “Pothi.com” to English readers. Hindi readers can see Hindi alphabets (it’s not a meaningful combination). But to the computer, it is “Pothi.com”.

However, with this font to my aide, I can now concoct certain character combination, which would look like meaningful Hindi words to Hindi Readers. For example character set “dje #i” generated the following

Meaningful Hind in KrutiDevHindi Readers can identify meaningful words here. Even though for the computer it is just “dje #i”.

This is one way of typing Hindi. And most of the Hindi Books are typeset in this way, using one of the fonts, that display an English character as a Hindi letter.

When the ultimate aim is to print, this method works just fine. Once the book is printed, nobody cares what the original character stored in the computer was.

But this method has issues – big ones. For example

  1. No standardization: When you don’t have characters assigned for the letters of your language in the computer, each font developer is free to decide which character should be displayed like which letter. So, one font decides to display “A” as “अ” and the other font decides to display “d” as “अ”. What do you do then? In English, you can write something and then change the font at the click on a button. But in Hindi, if you change the font after writing, you will get totally different letters displaying on the screen, which are likely to be meaningless. Plus for each font you have to learn the typing all over again!Lack of standardization is also a problem in the Internet World. If you type the content in one font and send it to someone, the recipient has to have the same font on his computer, in order to see the meaningful text you have written. Any other font will not do. Compare this to English, where you may type in one font and the other person may not have that font. But he can still read it, because whatever English font he has understands the underlying characters and displays the correct letters for an English reader.
  2. Not searchable: In this system, the computer does not understand underlying characters of Hindi language. It is just the English language characters wearing a different look as far as the computer is concerned. So, there is no good way of searching through this content. In the Internet age, this is a major disadvantage. A lot of content available on Internet today is discovered only by search and if you want your content to be discovered, it is important that it is typed in a way so that it is searchable.

Unicode

It is to solve such problems that Unicode has come into picture. You can think of Unicode as something which enables computers to understand characters beyond English language. So, if your computer supports Unicode, it starts understanding not only the characters corresponding to “A”, “d”, “,” etc. but also the ones for “क्”, “अ” etc. And it’s not just the Indian languages, but it starts understanding characters corresponding to Chinese, Japanese, Arabic, Russian and most other major languages of the world!

So, with this you do not need to represent a random English character as a Hindi letter. The characters are available for Hindi and the font can now display those characters as the corresponding Hindi letters. Such fonts are called “Unicode compatible fonts”. To repeat, Unicode compatible Hindi fonts are the ones which do not represent an English characters as a Hindi letter, but which represent the Hindi characters as corresponding Hindi letters.

Typing and Input Method Editors (IMEs)

So far, so good. Computer, somehow, understands the characters for Hindi and other languages. But how do you type in those languages? Your keyboard still has only English letters on it. So, when you press the key labeled “A”, the computer knows that you want to type the character “a”. But how do you tell the computer that you want to type the character “अ”.

Multi-language keyboards are a design challenge, and at least for Indian languages, nothing great has come out. So, different ways have been devised to use the same English keyboard for inputting non-English characters. To  understand how these work, let’s consider this. As far as the computer is concerned (“A” and “a”) are two different characters. But from the keyboard, the same button is used to type either of them. How? “A” gets typed if either CAPS LOCK is on or the Shift key is pressed. Otherwise, it is “a” that gets typed. So, the computer decides that the character typed is “A” or “a” depending not only on the key pressed, but also depending on the state of CAPS LOCK and Shift key.

Following a similar tactic, we can give the computer some other signal that when the key labeled “A” is pressed, you have to enter neither “a”, nor “A”, but “अ”. How to give that signal? For that there are multiple methods. Basically computer programs have been created, that come in between the keyboard and computer storing the characters and depending on certain signals tell the computer which character has been entered. These programs are typically called “Input Method Editor (IME)”.

These IMEs do two things

  1. They give you a way to specify the language you will be typing in
  2. They assign particular keys on the keyboard to particular characters, depending on the language selected

Two examples of IMEs for Indian Languages are

  1. Microsoft’s Indic Language IME
  2. Google’s Indic IME

I have the first one installed on my computer and I’d use that as an example to illustrate how IME works. Once I install and configure Microsoft’s Indic IME for Hindi, I get a language selection button my taskbarMicrosoft Indic IME Language Selection

If I select, English here, then things work as usual. If I select Hindi, then pressing Shift+D on my keyboard types “अ” instead of “D”. Pressing “j” types “र” and so on. I can keep switching the language and type a piece of text which uses both the languages (as I am doing now).

To use an Indic IME like this, you still need to learn the key combinations that type the right characters for you. This combination may vary between different IMEs. In fact, even the same IME may provide you with different options for mapping of keys to characters. Microsoft’s Indic IME provides at least two such combinations for Hindi. One is called “INSCRIPT” layout, which I use (the key-character combinations I described in the previous paragraph was according to this layout). The other is Phonetic and as the name suggests, its key-character bindings are more phonetic; e.g. (A will “अ”, R will be “र” and so on).

But the advantage over the earlier scheme of using non-Unicode compatible font is that once you have learned to use one IME, you can use any Unicode compatible font. You don’t need to learn the map for every font separately. Plus you text is standard compliant and searchable!

Google’s Transliteration Technology – Saviour for beginners!

If you are a beginner with Hindi typing, you would probably want to use an IME with Phonetic key-character combination. For example, where “A” typed “अ”, R types “र” etc. It is easier than using other combinations where the mapping may be very arbitrary.

But you still need to learn the exact key combinations for typing something. If you need to type Pothi (पोथी) in Hindi, do you type “Pothi” or “Pothee” or “Pothii”. With most IMEs only one of these will work.

Google’s IME is different here. It works more intelligently. Instead of assigning fix keys to the characters, it guesses the correct word from the combination you have entered. Basically from the various words you could possibly write, it guesses the word based on grammatical correctness and frequency of use in languages. If it guesses the wrong word, you have a way to change it to different word. In Google Transliteration all three “Pothi”, “Pothee” and “Pothii” produce the same (and correct) word पोथी.

So, you can essentially type words and so long as it is close phonetically, this IME will find the suitable word for you. This makes it a great tool for beginners. You can get started right away, write Hindi the way you do while chatting with your friends or in SMS and start getting output in a Unicode compatible  font.

Google actually has an online service for this – http://www.google.co.in/transliterate . So you don’t even have to download and install anything.

All is not well here though. Once you start typing in Hindi regularly, you will start feeling the limitations of Google Transliteration. We will not get into the details here. But if that happens at some point, it may make sense to invest some time in learning another IME, which uses fixed combinations.

Complex Text Layout (CTL)

There is one major difference between Hindi (most Indian Languages) and English. In fact even between Hindi and Russian, Hindi and Chinese or Hindi and Japanese. The representation of a character changes depending on the context in Hindi and many other Indian languages. The representation of “द्”, for example is different in words तद्भव and विद्या. One has “द्” before “भ” and the other has “द्” before “य”. Compare this to English where how “d” is displayed does not depend on which letter comes before and after it. So, for displaying Hindi correctly the computer needs to understand all possible ways of displaying the characters under different contexts. The technical term for this is “Complex Text Layout”. Most computers with modern operating systems have this ability now and in all likelihood, you will not have to do anything special about it. But if you find a problem in display where a character is identified correctly, but is not displayed correctly, then you would know that it is an issue with computer not understanding “Complex Text Layout”.

In Windows XP and Vista, the complex text layout is enabled by default. In Windows 2000, you needed to enable it specifically. I have not tested it on Windows 7, but hopefully things should not go retrograde.

Finally

The description here is intended for a non-technical audience. Many concepts have been simplified and a purist technical person may be tempted to correct my usage of various terms (“You mean OS when you say computer!”). Let me just clarify that it is totally intentional. I just hope it has not become too technical for the non-technical audience 🙂

Questions are welcome as comments!

50 years later, son publishes father’s writings!

Sachendra Garg ‘Rashmi’ grew up listening to the poets in various kavi-sammelans/Mushairas during his formative years in 1950s. Be it the ones organized at Lal-Quila and broadcasted on radio on the occasion of Republic day, or the annual ones at Dayal Singh College Karnal. With this exposure and encouragement from his teachers, he was soon writing poems himself, which were published in many contemporary magazines and anthologies with the pen-name Rashmi. He wrote a lot in the period of 1955-62. A dream of publishing them as a collection was born at that point of time, but other callings of life took precedence and it never materialized.

Some 50 years later, his son Shaleen Garg decided to re-collect the poems he regarded as the “most valuable property the family has” and publish the collection.

The dream is now a reality. The book “Vihaan” (a name that poet had thought of way back in early 1960s) was launched in Yamunanagar, Haryana in presence of more than 150 people. The vimochan of the book was done by Dr. Ramesh Kumar, General Secretary of Mukund group of educational institutions. The four hours programme also rightfully included a kavi sammelan. Some photographs are below. The book is available on Pothi.com.

Pothiz – Pothi.com’s e-Magazine

Over last two years we have interacted with a lot of writers and readers. And one thing we have always wanted to do is to bring the readers and writers together. The self publishing platform is one way, of course. But many other ideas also kept propping up. We do not have time to implement all of them, but today we are acting on one of these.

We are announcing the launch of Pothiz – Pothi.com’s e-Magazine. Contributions are invited for the same. Selected entries will make it into the first issue of the magazine scheduled to come out in July 2009.

Details are available on Pothi.com’s website and submissions can be done here.

To start with the magazine will reach out to 2000+ people from our subscriber base and more!

So, all the existing and aspiring writers out there. What are you waiting for? Submit your entry right away! Let the creative juices flow.

And hey! Don’t forget to tell others around you about it.

Preview is for showing the book, not hiding it

Authors work hard on their books. It is, therefore, natural for them to be very protective of their manuscripts.  However, all the authors, especially the new, unknown ones, have to carefully balance the threats of piracy and the threats of obscurity. If the book is not known then you can be sure that the book will not be pirated. But in that case it won’t be bought by anyone either. How much to open up your book and how much to protect it is probably a matter of endless debate. So, right now we will not get into that, but we will focus on a small feature at Pothi.com called “Preview”.

When submitting their books the authors can specify certain portion of the book to be exposed for people to read online as “Preview”. We mandate a minimum of 10 pages to be included there. The idea is that most people publishing with us are first time authors. Plus the manuscript has not been vetted by a third-party. So, it is important that the potential readers get to see enough of the book to make a decision about whether or not to buy the book. We have put a minimum number there, because we feel that the exposed content should be enough to let users make up their mind about the book.

Some authors make good use of this feature. Let’s say you have written a novel. And you expose 60-70% of the novel, or even 90% of it on the site as “Preview”. What is it that you should be scared of? That people will read it for free and not pay for it? Consider this – if somebody actually reads 90% of your book, then he is probably sufficiently interested in it and would want to read the ending. He might end up paying for it. But if you circumvented the minimum 10-pages mandate by only exposing your table of contents and preface, the reader never had a reason to get interested in the book and hence would never consider buying it.

The logic will have to adjust for different genres and forms, of course. Exposing 90% of a short story collection will not have the same effect (it may still be useful for other reasons – e. g. the person may be induced to buy your next book). But exposing around 40% of the book would be worthwhile. Somebody who read 2 of your stories online, might be interested enough to pay for the remaining three too. Somebody reading only the preface and table of contents may never bother.

Similarly, one should expose at least one chapter (more the merrier) with substantial content for non-fiction. Don’t put “Introduction” and “Foreword” in the preview. If there is something about the book you want the readers to know “Description” is the section to do that. Make good use of “Preview” feature and put in an actual chapter in there. Let the reader find solution to an actual problem and decide that she wants to read the rest of the book too.

So, if you want to update the “Preview” of your book to make it more meaningful, here is the FAQ detailing how to update your book.

Recent Changes on Pothi.com Platform

We have pushed some small changes on Pothi.com’s publishing platform recently. Here is a consolidated update

  • Links to other books from the same category on Book Pages: As the number of books is increasing on the platform, discovery mechanisms are becoming important, so that potential readers can find books of their choice. We have started off with a simple feature in this direction. The book pages will now have links to few other books from the same genre. So, people interested in similar books can browse around.
  • Fields for subtitle and ISBN: These were long due. There was no way for the authors to specify subtitle and ISBN for their books. Authors can go to their book pages, edit them and include these information.
  • Introduction of Tags field on the book edit form: Tagging the content helps in search and discovery of the content. There is now a “Tags” field that authors can populate for their books. Once we have sufficient number of books tagged, we will expose them suitably.
  • Setting MRP of the book, instead of setting the author margin: Till now, authors used to set the author margin for a book and the price of the book was calculated by our system. This was counter-intuitive to many people, who wanted to set the final price of the book. We have changed that system, so that people can now set the MRP of the book and author earning (royalty) is calculated by our system. It does not change the underlying price/margin relationship. Authors get the same earning for a given price as they used to get earlier. You can try out our new price estimator for the same. If you have used the price estimator earlier, you will notice that there are now two tabs instead of one. One of them called “For Yourself” lets you calculate the “Author Price” – the price at which authors can always buy the copies, irrespective of how they are priced on the store. The other one called “For Distribution” let’s you calculate the royalties at various MRPs. Again, the underlying logic has not changed. The earlier estimator also used to show you the author’s price apart from the store price. We have only rearranged the forms for the convenience of users.

Some more exciting stuff is in the making right now. Stay tuned for them 🙂

Converting Word Files to PDF

At Pothi.com PDF is the best format to submit your books in. Any other format your submit gets converted to PDF and that PDF is used for printing. If the book is submitted in any format other than PDF and the conversion is done at our end, there might be some issues. For example, if you have used a font that is not available on our system, converted document won’t have those fonts and will not look good.

Its not a surprise that the most common format we receive the books in is MS Word format. So, here are a few tips to convert MS Word to PDF.

Table of Contents

Popular PDF converters

  1. Adobe Acrobat: This is a paid option from Adobe. It is expected to create most standard compliant PDF. Some of the POD providers insist that the PDF submitted to them should be created through Adobe’s product. Pothi.com does not have that requirement, but if you do have Adobe, you should use it. Once you have the product installed, you will have “Adobe PDF” as a printer option. You need to press “Ctrl-P” and select “Adobe PDF” as the printer.Screenshot showing Adobe PDF in printer listIf you just click “OK” from here and save the resulting file, it will create a PDF, but it will result in one of the most common problems we see in the files submitted as PDF. The page size in the resulting PDF won’t be the page size you had set up for the book. It will letter (8.5″x11″) or A4 (8.26″x11.69″).We will discuss how to correct this, later in this article.
  2. Office 2007 – Save as PDF: If you are using Microsoft Office 2007, you can install and use its “Save as PDF or XPS” plugin. Once you have this plugin installed, you will have the option to save as PDF available in your office menu.This option will work out of the box. The pdf file created will have the correct page size too.
  3. Cutepdf Writer: This is a Free option and works very well. You can download the Free CutePDF Writer. After installing it, “CutePDF Writer” will be one of the printers listed, when you Press “Ctrl-P” on your MS Word Document. You can create a file by selecting this option and clicking on the “OK” buttom. It will, hwoever, give the same problem as “Adobe PDF” of not creating the right page size. How to solve this will be discussed next in the article.

Getting the right Page Size in the converted PDF


Right Size in the MS Word file

The first step is to ensure that you have set up the intended page size in the MS Word itself. The default page size in MS word is either A4 or letter. Most of the time a print book will not look good in these sizes. You can check out these MS Word Formatting related FAQs on Pothi.com to ensure that you have set the correct page size in MS Word. A list of supported Book Page Sizes on Pothi.com is available in our FAQs.

Once this is taken care of  “Save as PDF or XPS plugin” for Office 2007 works out of the box.

For Adobe PDF and CutePDF Writers, you need to set the the correct page size.


Adobe PDF

After Pressing “Ctrl-P” and selecting “Adobe PDF” as the printer, click on properties. You will see a screen like following. If you don’t get it, make sure you click on “Adobe PDF Settings Tab” on the screen that comes.

Click on the “Adobe PDF Page Size” drop down. If your intended size is there in the drop down, you can select it from here, click “OK” and proceed as usual.

If the page size is not there, you need to click “Add” button next to the drop down. It will give you a screen like following

Enter a name of your choice in “Paper Names” and enter the intended width and height in “Paper Size”. Remember to select the right “Unit” (inch or mm). The following will create a size of name “My Book Size” of the page size 5″x8″, which you will be able to use going forward.

Click on “Add/Modify”, then click on “OK”. You will be back to the following screen with printer list

Screenshot showing Adobe PDF in printer list

Now click on the “Properties” again and this time, “My Book Size” will be available in the “Adobe PDF Page Size” drop down.

Select the “My Book Size”, click “OK” and proceed to create the PDF file.


Cute PDF

In CutePDF the option for changing the page size is almost hidden.

Press “Ctrl-P”, select CutePDF writer as the printer and then click on “Properties”. On the resulting screen select the tab “Paper/Quality”

Now click on “Advanced” Button. On the resulting screen, click on the “Paper Size” Drop down. If your intended paper size is available, you should choose that.

Else, you should select “Post Script Custom Paper Size”. The resulting screen will let you specify the page size. Enter suitable width and height. Remember to select the right unit (inch or mm) The following selection will make the page size 5″x8″.

 

Click “OK” (4 times) to create the PDF of the desired size.

Queries?

Have more queries about submitting PDF files on Pothi.com? You can shoot them as comments here and we’d try to answer them.

Now get instant feedback on common file errors for your book

Pothi.com is open to, even encourages authors to prepare their book on their own. The specifications are available in details on our website.

However, there are some common errors that are often present in the files submitted. Till now, authors needed to wait for our verification process to get the feedback on those. We have now introduced an automated files checker in book submission form, which will instantly analyze the submitted files for the common errors like page size of the interior and size of the cover. So, you don’t have to wait to correct those.

Enjoy the power and contact us if you face any problems with the new system.

You Make the Story – We Make the Books!

Shachii and Gaurav Manik are a Mumbai based couple who run a software company as a day job. However, it is their kids at home (two boys aged 5 and 3) , who do a great job of challenging their creativity. Gaurav loves to make up and tell stories which he creates with the help of the two boys.

This family exercise made them realize how creative the kids’ minds can be and the idea of “You Make the Story – The Creativity Workshop for Kids” was born. During the workshop the kids are encouraged to come up with a basic story plot and draw pictures of a scene from this plot. Then Shachii and Gaurav expand on the story and make it comprehensive, create and design the book, publish it and have it printed from Pothi.com. Two books from their earlier workshops are available on Pothi.com – “Adventures of the Red Club” and “The Amazing Journey to Zog“. A copy of the book is given to the kids who helped create the book.

The workshop is targeted at the kids between the ages five and ten years. These workshops are a one time session of 90-minutes each and the duo is open to conducting the workshop in any locality in Mumbai. ” We think it is a great way to bring out the imagination of a child, they seem to enjoy working in a group to create a story and draw the pictures. In the end it gives them a personalized book and a “I helped create this!”-feeling, and a special bond with the other kids that were present in the same session,” says Shachii about the workshop.

You can follow Sachii on Twitter.

Publishing books from the workshop is a great example of the use of Print on Demand. It does not take a lot of money to become a publisher and the encouragement the children get from seeing their work in a book is invaluable.

What are your kids doing this summer? Encourage them to express their thoughts and become a publisher for them right away!

Check out some other awesome books created by children which are available through Pothi.com. We will also try and bring to you glimpses from some of the privately published projects this month.

Now restrict the shipping to India only

We have been shipping the books internationally and this is very convenient for most of our authors and readers.

However, in certain circumstances, authors need to be able to restrict the shipping from Pothi.com to India. It generally happens in one of the two cases

  • The rights of the book for outside India are with someone else and hence the authors can not sell their books published through Pothi.com there.
  • The author wants to introduce a low price version specifically for Indian market.

So, we have recently pushed out this feature where you can specify if you want your books to be shipped only within India. If Pothi.com is the only place you have published your book at, you probably do not need to bother about it. But if your case is either of the two mentioned above, you can use this feature. Log in to your account, go to your book page and click on edit link. Under the field “This book can ship to”, select “Only India” and save the book. Now, this book will not be shipped to any location outside India.

Even more power to our authors! 🙂