Driving a character LCD using PIC24 Enhanced Parallel Master Port

E strobe train

E strobe train

Traditionally, Toshiba HD44780-compatible alphanumeric LCD displays are driven by bit-banging bus signals combined with long delays between sending commands and data. In many cases this method is good enough. There are other cases as well where extra CPU cycles are not available and more economical method of driving a display is needed. I’m currently working on a design involving very fast USB exchanges combined with occasional LCD output and developed a solution which works very well for me. I’m posting it with hope that my fellow developers will find it useful.

HD44780 displays have been around for a long time. The internet provides plenty of posts about them, code samples and even a Wikipedia article. My favorite introductory text on the topic is Dincer Aydin’s LCD Info page.

PIC24 16-bit microcontrollers from Microchip have been around for some time as well. They are cheap and powerful and the Microchip C30 compiler (free version available) is quite good. They are not as popular as their 8-bit counterparts from Microchip and Atmel therefore good PIC24 resources are scarce. One nice introductory text on the topic can be found at Engscope.

Since I’m trying to minimize CPU time spent driving the LCD let’s first talk about timing in general. When developing for HD44780 we need to deal with 3 different times. First is the timing of the display part – the screen we see. LCD glass is very slow. When we attempt to update the screen faster than say twice a second the symbols become blurry and pale. The fastest display in my collection still looks OK when updated at 4Hz rate (250ms), while most others are twice as slow.

On the other hand, display data bus timing is many times faster. In order to write to the display we first need to set RS, RW and data lines, wait a little, then assert E line, wait some more and then de-assert it. If we are reading from the display we will also need to wait a little more after de-asserting E before we can read the data on the bus. Total bus cycle length is ~2.5us, which is 200 000 times less than the update rate of typical LCD glass. This time is pretty short but the MCU is still faster – a PIC24F clocked at 32MHz has an instruction cycle of 62.5ns and in 2.5us it will be able to execute 40 instructions. Therefore, no matter how simple it looks, it is preferable not to bit-bang the bus.

The third timing we need to deal with is command execution time. All but two LCD commands have stated execution time of 40us. Two slow commands – Clear and Home require 1.64ms to finish. Those are datasheet numbers, in reality the fast command on a modern display may finish in as low as 10us and slow commands on an old display can take as much as 3.5ms, depending on the age and the particular “HD44780-compatible” controller used. It is about 100 times faster than the glass.

In order to drive my LCD efficiently I did the following:

  1. Assigned LCD pins to be driven using Enhanced Parallel Master Port (EPMP) peripheral available in PIC24FJ256GB206 part I’m using. This allows starting (and completing) the bus cycle by simply “writing” to a certain memory address.
  2. Created a circular buffer for commands and data to be sent to LCD. This allows to access the LCD asynchronously – the application places a string of characters into the queue and the LCD outputs them at its own pace. In order to be able to use the same byte-wide queue for both commands and data I’m inserting a special “flag” character in front of a command.
  3. Wrote a timer interrupt routine to read the queue, send commands/data and wait the necessary amount of time between commands. Using the timer allows the application to keep executing during periods of waiting. The timer routine also switches timer delay according to the type of last command sent to the LCD.

The rest of the article talks about the implementation details. For demonstration purposes I wrote a simple test application, most interesting pieces of which will be explained below. The code for the application was largely copied/pasted from another project therefore the choice of MCU part, crystal speed and timer look rather arbitrary.

I will start with explaining the EPMP piece. The peripheral allows creating parallel memory interfaces with different bus and data sizes, as well as control signals with programmable times. It is similar to previous version of the peripheral called PMP, i.e., Parallel Master Port. An example of PMP-driven LCD is Microchip’s ever popular Explorer 16 board. Another example of using PMP to drive an LCD is given in Lucio Di Jasio’s book. In my application, the pinout is very similar to one used in the sources mentioned: the EPMP PMA line is used for RS, PMRD is used for RW, PMWR is used for E. For simplicity I’m using 8-bit LCD interface and 8-bit EPMP.

Unlike PMP, EPMP uses extended memory space to access the bus. This is how the definitions of LCD command and data registers look like:

//address allocation for LCD registers
__eds__ uint8_t __attribute__((noload, section("epmp_cs1"), address(CS_BASE))) LCDCMD __attribute__((space(eds)));
__eds__ uint8_t __attribute__((noload, section("epmp_cs1"), address(CS_BASE))) LCDALIGN __attribute__((space(eds)));
__eds__ uint8_t __attribute__((noload, section("epmp_cs1"), address(CS_BASE))) LCDDATA __attribute__((space(eds)));

First line defines LCD command register, third line defines LCD data register and the second line is used to align data register to 16-bit word boundary. This is done so that RS will clear while accessing LCDCMD and set while accessing LCDDATA – PIC24 is a 16-bit MCU and each memory address addresses 2-byte word.

Now, let’s program the EPMP registers in order of appearance in corresponding PIC24F Familiy Reference Manual AKA FRM, the first being PMCON1:

PMCON1bits.ADRMUX = 0;      // address is not multiplexed
PMCON1bits.MODE = 3;        // master mode
PMCON1bits.CSF = 0;         // PMCS1 pin used for chip select 1, PMCS2 pin used for chip select 2
PMCON1bits.ALMODE = 0;      // "smart" address strobes are not used
PMCON1bits.BUSKEEP = 0;     // bus keeper is not used
PMCON1bits.IRQM = 0;        //interrupt at the end of of rd/wr cycle

The bit settings are self-explanatory. Basically, I set non-multiplexed address lines and master mode of the module.

The PMCON2 register can be left at default. The PMCON3 looks like this:

PMCON3bits.PTWREN = 1;      // enable write(rd/WR) strobe port
PMCON3bits.PTRDEN = 1;      // enable read(enable) strobe port
PMCON3bits.AWAITM = 0;      // set address latch pulses width to 1/2 Tcy
PMCON3bits.AWAITE = 0;      // set address hold time to 1/4 Tcy

Here I enable READ and WRITE signals and set address latch signal (which is not used) delay to the minimum. Later, I will combine READ and WRITE to a single RW and make E(NABLE) out of WRITE.

The PMCON4 configuration is very simple:

PMCON4 = 0x0001;            // PMA0 address line is enabled

Here I enable a single address line which will serve as LCD RS signal switching between command and data registers.

The PMCS1CF register defines the behaviour of lines used as LCD RW and E lines. They are tied to EPMP CS signal so we need it active even though we don’t need CS to drive an LCD.

PMCS1CFbits.CSDIS = 0;     // enable CS function
PMCS1CFbits.CSP = 1;       // CS1 polarity
PMCS1CFbits.CSPTEN = 0;    // disable CS port
PMCS1CFbits.BEP = 1;       // byte enable polarity
PMCS1CFbits.WRSP = 1;      // write strobe polarity - enable active high
PMCS1CFbits.RDSP =1;       // read strobe polarity, READ high, WRITE low
PMCS1CFbits.SM = 1;        // read/write and enable strobes
PMCS1CFbits.PTSZ = 0;      // data bus width is 8 bit

Even if CS signal is not used we still need to configure it since READ/WRITE and ENABLE depend on it. CSDIS enables the CS function and CSPTEN disables the CS pin. The part I’m using have it combined with upper PMA address which has already being disabled in PMCON4. The CS is separate on 100-pin parts so I’m disabling it second time here just in case. BEP can be any value since it’s not used. WRSP sets E polarity (active high), RDSP sets 0 for write and 1 for read to be used as LCD RW line. SM combines READ and WRITE into a single pin and sets separate ENABLE. Finally, PTSZ sets data bus width to 8 bits.

PMCS1BS = (CS_BASE>>8);     // CS1 start address

PMCS1BS sets starting extended memory address for EPMP. An attentive reader may have noticed that this is the same address used in LCDCMD and LCDDADA definitions. I’m using the 0x20000 address which is the default; technically, in this case I don’t need to set PMCS1BS since it contains the same value at power-on – I just wanted to show how it’s done.

Last interesting piece of EPMP configuration is timing of all important signals. The necessary bits are contained in PMCS1MD register, as follows:

PMCS1MDbits.DWAITB = 3;      // time from RS,RW to E
PMCS1MDbits.DWAITM = 0x08;   //E strobe length - 450ns by spec
PMCS1MDbits.DWAITE = 3;      //time from E to valid data

These times were chosen conservatively. In my experience, for modern displays they can be made much shorter and even set to zero. The speed advantage is very small and only appears when LCD reads are performed.

The last thing to do is to enable the EPMP module:

PMCON1bits.PMPEN = 1;        // enable the module

After this is done, writing to an LCD is as simple as doing LCDCMD = command for commands or LCDDATA = data for data. Since a bus cycle is longer than a single instruction cycle and EPMP has no buffer in master mode we would have to wait between issuing consecutive writes. However, since we will also need to wait for the LCD to digest what’s been sent to it and this wait time is substantially longer than a bus cycle, we don’t need to worry about that.

Reading the LCD does involve waiting since the data become available on the bus at the end of a cycle. I don’t use LCD read in my code but here is a short example borrowed from the FRM:

value = LCDCMD;         //dummy read
while(PMCON2bits.BUSY); // wait for the end of bus cycle
value = PMDIN1;         // real read

First line initiates the bus cycle, second line waits for the bus cycle to complete, last line reads the value from the bus at the falling edge of E. This example reads from the command register and can be used to check BUSY flag. Also, if times in PMCS1MD are all set to zero wait states the bus cycle will take one instruction cycle and checking EPMP busy flag won’t be necessary.

To feed the data to LCD I’m using a simple one-way circular buffer AKA queue, defined like that:

//LCD buffer size - must be power of 2
#define LCD_TX_BUFSIZE 256
#error LCD Tx Buffer size is not a power of 2
//LCD buffer
uint8_t LcdTx_Buf[LCD_TX_BUFSIZE];
uint8_t LcdTx_Head;
volatile uint8_t LcdTx_Tail;

To make queue management easier the size must be a power of 2. The size of 256 saves couple instruction cycles; if memory size is more important the buffer size can be decreased. LcdTx_Head is moved by a producer of the data and LcdTx_Tail is moved by the consumer, as will be explained later in the article. LcdTx_Buf is the buffer itself. Very detailed explanation of this type of circular buffer is given in Fred Eady’s excellent Networking and Internetworking with Microcontrollers book on pages 51-70.

I will now explain the timer interrupt service routine (ISR) which consumes the queue and sends data to LCD using EPMP. The routine gets called each time the timer overflows. Two distinctive time intervals are used – one for fast commands and data and the second one for slow commands Clear and Home. When the queue is empty the timer is stopped since there is no reason to run it anymore. When data is placed into the queue the timer is started again. Commands in the queue are preceded by a special “flag”: the ISR tracks that and sends data to either command or data register. The following (rather long) listing demonstrates how all this is coded. The interesting lines are explained after the listing.

//Timer interrupt
#define TIMER3_ISR_PRIO 1
void  __attribute__((__interrupt__, auto_psv)) _T3Interrupt(void)
 static uint8_t state = 0;
    _T3IF = 0;    //clear interrupt flag
#if LCD_TX_BUFMASK < 255
        LcdTx_Tail &= LCD_TX_BUFMASK;
	switch( state ) {
  	case 0:    //read byte, send data
    	if( LcdTx_Buf[ LcdTx_Tail ] == CMDFLAG ) {    //next byte is a command
     		TMR3 = PR3 - 20;    //shorter cycle. Must be set longer than the execution time of the rest of the ISR
        state = 1;
      }//if( LcdTx_Buf[ LcdTx_Tail ] == CMDFLAG...
      else {
      	LCDDATA = LcdTx_Buf[ LcdTx_Tail ];    //send data
      	PR3 = BSP_TMR3_PER_SHORT;
  case 1:    //send command
  	LCDCMD = LcdTx_Buf[ LcdTx_Tail ];    //send command
    	if( LcdTx_Buf[ LcdTx_Tail ] < 4 ) {  //slow command
      	PR3 = BSP_TMR3_PER_LONG;
      else {
      	PR3 = BSP_TMR3_PER_SHORT;
      state = 0;
    }//switch( state...
    if( LcdTx_Head == LcdTx_Tail ) {    //stop the timer
        T3CONbits.TON = 0;
  • Line 2 The interrupt priority set to the lowest value. We don’t need to serve it very fast. This definition is used during timer initialization shown later in the article
  • Line 3 Compiler instruction stating that the function is the ISR
  • Line 5 In order to “remember” that the previous byte in the queue was a command flag the ISR is written as a simple two state state machine. This variable holds the state
  • Line 9 Advance the “tail” of the buffer
  • Line 15 The state machine
  • Line 19 Checking if the next byte in the queue is a command
  • Lines 21-23 If yes, change state to the next one. Also, since nothing has been sent to the LCD we don’t need to wait and can read the next byte immediately. An interrupt occurs when TMR3 equals PR3 and the timer counter runs all the time until stopped explicitly. Therefore if number of counts in TMR3 is less than the number of instruction cycles necessary to finish the ISR the interrupt will occur immediately after return. It is also possible that the byte just read was the last one in the queue – in this case the application will never have a chance to place another byte in the queue. For this reason, the number of remaining timer counts should be assigned generously
  • Lines 25-30 If no, then write the byte to LCD data register and set the timer period to “short command” interval
  • Line 31 Jump past the closing brace of the switch statement
  • Line 33 Next state, which sends the command
  • Line 35 Sending the command
  • Line 38 Checking if the command is long or short. Long commands have command codes 1 and 2, the next command code is 4
  • Lines 40-47 Load the period register with the value necessary for the command delay. This is the beauty of 16-bit timers – when clocked directly from the system clock the range for such timer is from 62.5ns to 4ms
  • Line 49 Switching to the previous state to analyze next byte in the queue
  • Lines 55-59 Stop the timer if the queue is empty. Next time it starts the counting will continue from where it was turned on so even if the next character will be placed in the queue immediately after returning from the ISR the delay condition will be satisfied

The following piece of code shows the initialization of this timer:

//Setup timer 3 for LCD
T3CON  = 0x0000;                     /* Use Internal Osc (Fcy), 16 bit mode, no prescaler */
PR3    = BSP_TMR3_PER_SHORT;         /* set the period */
TMR3   = PR3 - 1;                    /* one count before interrupt */
_T3IP  = TIMER3_ISR_PRIO;            /* set Timer 3 interrupt priority */
_T3IF  = 0;                          /* clear the interrupt for Timer 2 */
_T3IE  = 1;                          /* enable interrupt for Timer 2 */
//we don't want to start this timer

The timer will be turned on when the first byte is placed into the queue and since at this time the LCD is ready to take it the TMR3 is set one cycle less than PR3 so the interrupt will happen almost immediately.

We will now take a look at the producer part – what needs to happen in the application to place a byte in the LCD queue. This is done by LcdSendByte()function:

/* Places a byte to the LCD queue. Can be used to send data */
void LcdSendByte(uint8_t byte) {
 uint8_t tmphead = LcdTx_Head + 1;
#if LCD_TX_BUFMASK < 255
 tmphead &= LCD_TX_BUFMASK;
   while( tmphead == LcdTx_Tail );	//this line blocks - keep buffer large enough
   LcdTx_Buf[ tmphead ] = byte;
   LcdTx_Head = tmphead;
   T3CONbits.TON = 1;    //start the timer in case it was stopped
  • Line 3 This is done to be able to “see” the tail index of the buffer
  • Line 9 If the buffer is full, wait
  • Line 11 Place a byte into the queue
  • Line 13 Advance the buffer “head”
  • Line 15 Turn on the timer in case it was turned off by ISR

This function can be used to send character data to the LCD. To send a command we need to insert a flag before it. The LcdSendCmd() function does just that:

/* Places a command flag to the LCD queue followed by a byte */
void LcdSendCmd(uint8_t cmd) {
  LcdSendByte( CMDFLAG );   //insert command flag symbol
  LcdSendByte( cmd );    		

We now have everything necessary to use the LCD. The following is a main() routine which first initializes both MCU and LCD and then fills first four screen positions with BSD-stype “rolling stick” character. Note that initialization commands are placed directly into LCDCMD – this is because during initialization wait time between commands must be made much larger.

int main( void )
//initialization commands for standard 16x2 LCD
 const uint8_t lcd_init_seq[] = { FUNC_SET, DISP_CTRL, LCD_CLEARDISPLAY, ENTRY_MODE, 0 };	//initialization sequence
 const uint8_t* lcd_init_p = lcd_init_seq;	//pointer to the first element
 const uint8_t rollchar[4] = {'/','-','\\','|'};
 uint8_t roll_idx = 0;
#define ROLL_IDX_MASK 0x03
	while( *lcd_init_p ) {	//power-on display initialization
		__delay_ms( 30 );
		LCDCMD = *lcd_init_p++;	//place a byte directly on the LCD bus
	while( 1 ) {	//output rolling characters in the first 4 posirions of the display
		uint8_t i;
		LcdSendCmd( LCD_RETURNHOME ); //Home the screen - slow command
		for( i = 0; i < 4; i++ ) {
			LcdSendByte( rollchar[ roll_idx ] ); //fast command
		roll_idx &= ROLL_IDX_MASK;
		__delay_ms( 1000 );
	}//while( 1 )

The GitHub repo mentioned in the beginning contains a single file with the program. In order to use it you need to compile it with Microchip C30 compiler (I use version 3.31) and load it to the PIC24 micro. You will also need to connect the LCD, the pinout depending on a part. For PIC24FJ256GB206 the pinout is this:

  • Pin 30 – RS
  • Pin 53 – RW
  • Pin 52 – E
  • Pin 60 – D0
  • Pin 61 – D1
  • Pin 62 – D2
  • Pin 63 – D3
  • Pin 64 – D4
  • Pin 1 – D5
  • Pin 2 – D6
  • Pin 3 – D7

The program may work on aforementioned Explorer 16 board equipped with EPMP-capable MCU – if you have one, please try it and let me know the result. The title picture shows the oscilloscope screenshot of E strobe when the program is running – Home, long wait then 4 characters.

The program can me modified for differnt CPUs and crystal speeds. Also, it is possible to fine tune the times. Simply change the intervals for short/long commands and see if the screen still looks good. The definitions look like this:

//timer period for fast and slow commands
#define BSP_TMR3_PER_SHORT	799     //Timer period for fast commands
//#define BSP_TMR3_PER_SHORT 2000
#define BSP_TMR3_PER_LONG		35000   //Timer3 period for slow commands
//#define BSP_TMR3_PER_LONG 55000

It should be noted that total execution time (or CPU time) is the same in all cases so it is not necessary to set timer period precisely. In most cases, one or the other set of numbers will be good enough.


5 comments to Driving a character LCD using PIC24 Enhanced Parallel Master Port

  • Many years ago I write a kernel module for these displays attached to the parallel port.
    The module is very dated, and is for 2.0/2.1/2.2 kernels.
    Timing was never that much of an issue, just as long as the specs are followed. I stopped developing it because kernels started changing APIs too fast and in incompatible ways.

  • Arturo

    i’m testing your code for a little project using an LCD with a PIC24FJ256DA210 and works very well but the only thing that don’t understand how to do it is, how do i change the line? i mean the carriage return and line feed commands

    Thanks and sorry if is misspelled, english is not my native language.

  • Arturo

    Arturo again, i found how, adding another command definition with the hex value C0, makes the new line. Thanks anyway.

  • Vismay


    I tried your code and it works well for PMWR(PMENB) as Chip select, however when i try to change Chip select from PMENB to PMCS1 it doesn’t work. I need to use PMCS1 and PMCS2 because the i am using 40×4 lcd which has two chip selects. Do you have any suggestions ?