JTAG is a standard (IEEE Std 1149.1) for boundary scan testing. In short, this means that it is a way to control nearly all the pins on a chip with a JTAG port for the purpose testing already assembled devices. By sending commands, you can set some of the pins on a chip as inputs, some as outputs, some as tristate. You can then send a set of data out the outputs and capture what comes in the inputs of this chip.
Further, JTAG can be used to drive a chain of such chips. So, you could have a CPU as one device on the JTAG chain, and a CPLD as another device on the chain, and use JTAG to test all the lines running between the two chips. As a side effect, JTAG can be used for in-circuit control of flash chips. This can be used to simplify manufactoring (your boards and preassembled, and you use JTAG to download the programming into the flash chip), or to salvage a bad reprogramming jobs (say from trying to put Linux on a WindowsCE device), or to load programs down to programmable logic devices like CPLDs and FPGAs.
A JTAG port typically consists of 5 lines. A clock line (TCK), a mode line (TMS), a data in line (TDI), a data out line (TDO), and a ground line. There is also an optional reset line (TRST). Some devices call their JTAG ports by other names or include it as part of a debugging port (like Hitachi's H-UDI which makes JTAG part of the embeded ICE system), and they may add additional lines (like Sun's JTAG controll system used for booting certain large servers). Such extensions are beyond the scope of this article.
The clock data and modes are clocked in on the clock signal. The clock signal does not need to be constant, nor does it need to maintain a minimum speed. All JTAG compliant chips will likely have a maximum speed that they will support the clock like running at.
The JTAG process is controlled by a state machine called the TAP controller. Modes in the TAP controller are controlled by TMS, in accordance with the below diagram.

Image used with permission from G.
Q. Maguire Jr.
The operation of JTAG is that you set the instruction register, then you set the data in the register attached to that instruction. Both registers are shift register. When the data has been entered (by clocking in data on TDI, while using TMS to control what state you are in), to switch to the Run-Test/Idle mode.
As just stated, every instruction register has an associated data register. This data register may be 0 in length. There are several standard instructions, plus many chips support proprietary instructions. At this time, we will only concern ourself with the SAMPLE/PRELOAD instruction, the BYPASS instruction, and the EXTEST instruction. The length of the instruction register, and the binary value representing each instruction are chip specific. This information may be found in some sort of manual, but the most common way to get it is to extract it from the chips BSDL file (a plaintext file format supplied by chip makers for each chip).
To do tests, the sequence of operations is to place chips you don't care about into BYPASS mode (ie set the instruction register to the BYPASS value). Other chips will go into the SAMPLE/PRELOAD mode. You will shift the boundary scan data into the data register (the length and meaning of this data is chip specific and is often only partially defined in the BSDL file, but valid values for the undefined parts are also given, so just worry about the part that is defined). When this is completed, the chip will be put into the EXTEST mode. Then the chip will be placed back into SAMPLE/PRELOAD mode. At this point, as you shift in new boundary scan data, the existing data (which will now include data on the status of the inputs when it was in EXTEST) will be shifted out TDO.
To Be Continued
Ideas not fully address:
Be forwarned. This won't do you much good unless you can figure out BSDL files on your own. Too much info on using JTAG is gotten from chip specific BSDL files, so until I document that at least a little, you probably won't get far.