top of page

Chinese Basics: First Step, Pinyin and Pronunciation

Note: When referring to "Chinese" through this entry, I am referring specifically to Chinese Mandarin. I have little to no knowledge of Cantonese, for now. Additionally, pronunciation approximations are based on American English, and I am uncertain how other English accents may affect these.


At this point I've spent many years both learning, teaching, and relearning various aspects of Chinese and when the question "Where do I start?" comes up I've always had different answers. Early on, I would have said something like "Learn your pronouns and some basic verbs and just go to town." A few years ago, I would have said "Grammar, grammar, grammar, get a solid grammar foundation then just let the pieces fill themselves in." Now, however, I realize that my mindset has been "run then walk" all along. Step one, above all else, is you need to learn your pinyin.

So what's that?

Simply put, pinyin is the romanization of the sounds found in Chinese. The catch being, these are best approximations only. Even after learning these best approximations, the road to proper pronunciation is a long one, so don't worry if it feels daunting at first. Speaking of daunting, the introduction to pinyin also brings forth the introduction to what I consider the second most difficult part of the language itself. Tones.

You are currently reading this in English, aren't you? So unless you are also familiar with one of the numerous other tonal languages in the world, the idea of a tonal language seems completely foreign to you, right? In Mandarin Chinese, there exist five tones. Five ways to pronounce any one sound that by virtue of how it is pronounced will change the way someone understands that word.

I'll delve into tones later, but before that understand that while it seems different, it is not entirely a foreign concept to any language, let alone English. Think back to the start of the paragraph above. Both sentences ended in a question. "aren't you?" and "right?". I was not reading these to you, and while this may not be 100% accurate to all readers, a majority of you likely read this in a certain way. In that telltale way that English tends to end an interrogative. With a rising intonation to the word. This is comparable to the second tone of Chinese and is a good example of how tones are not as foreign a concept as we may initially believe when starting to learn them. Think of how you read "right?" when you know it's a question, then how you read "right." when you know it's a confirmation. One rising, and one falling. Words read the same way, with a different intonation, and because of this, a different meaning. Tones.

With that, we move to the final piece of the puzzle, and the first steps towards correctly reading pinyin, vowels and consonants. The pronunciation of A, E, I, O, and U and knowing how they interact with the list of consonants found in Chinese is the most basic of foundations to build up the rest of your Chinese knowledge on. So let's start.

Vowels and Consonants:

[A] - Pronounced like the "Ah" sound found in words like "car", or the "Ah" in "Ah, that's how you say that."

[AI] - This combination creates a sound similar to the English word "eye" or "lie."

[AO] - This combination creates a sound similar to the "ou" in "OUCH" or "loud."

[E] - Pronounced like "uh" as if you're being punched in the stomach. Or the "uh" sound in "book" but shorter.

[EI] - This combination denotes a sound similar to the long "A" sound in English. Think "ay" in words like "say" or "bay"

[O] - Pronounced as "Oh" on its own. But gets more complicated when paired with certain consonants. For example, in words with pinyin like "wo" or "bo" where it is just the consonant and the "o" there's is almost a very subtle "u" sound that precedes the "o." It is quite slight, but when said side by side with the next combination it becomes more recognizable.

[OU] - Pronounced like "Oh" with a bit more of a trail on the u end, similar to "oats" or "float." When the sounds "Mo" and Mou" are pronounced side by side, the inclusion of the hidden "u" sound on "Mo" becomes much more apparent. It's a good idea to sit and listen to native speakers pronounce these words side by side to get a good understanding of that difference.

[U] - Pronounced like the "oo" sound in "zoo." Simple enough as is.

[UA] - A combination of the "oo" and "ah" sounds. Similar to what you hear from the "wa" in "water" or "swan."

[UI] - A combination of the "oo" and long a sounds. Similar to the word "way."

[UO] - A combination of the "oo" and "oh" sounds. Similar to the "wo" sound from the word "won't."

[UN] - A vowel consonant combination, noteworthy because, similar to how a consonant + O has a hidden u sound, a U + N will have a subtle hidden "e" sound between the U and N turning the pronunciation "tun" into something more like "twin" and so on.

[Ü] - Those familiar with German will be familiar with the umlaut, but for those not, this can be a tricky one to get used to. Sometimes also written as a v instead of ü, this is pronounced similar to the word ewe, where the "y" sound is subtle and your lips form a U shape when pronouncing it. It may take practice to get this one down, but is important as there is a difference between lu and lü.

[I] - In most cases, pronounced like "ee" in "see." However, this vowel has an interesting relationship with the consonants "C, S, Z" that can create confusion with people just getting started. When pronounced following these consonants on their own, as in "Ci, Si, Zi", the "i" takes on more of an "uh" sound, similar to the "i" in "sit" with that same oomf to it, but a bit more of an "eh" sound at the start.

[IA] - Pronounced by bringing the "ee" sound and "ah" sound together, similar to how you might expect "ya" to be pronounced.

[IAN] - where you may think that since the above is a "ya" sound, this would create a "ya+n" sound, but in reality the ia turns into more of an "ee" + "eh" sound.

[IE] - Similar to above, the sound created here is the "ee" + "eh" sound. Pronounced almost like the "ye" in the word "yes."


Thankfully, for the most part, consonants will follow the rules you expect them too based. "Di" will sound like "dee," "dou" will sound like "dough" and so on. No real surprises until you get to a select group of trouble makers. We'll start with the simplest first, and work our way to the "but why...?" cases in a bit.

[Y] - Pronounced like you would expect usually. Like the "y" from "yes." However when it is followed by an "i" it is basically silent. Where "i" produces an "ee" sound, the word "yi" is pronounced "ee." Quite simple.

[C] - Pronounced like a "ts" sound. A quick short "ts" sound almost like the "zzzz" of an electric shock, only with a "t."

[Z] - Similar to the above, the "z" sound is like a "dz" sound. A quick "dz" like the "zzz" of an electric shock.

The Troublemakers

And now, we get into where things get different, but not entirely unmanageable. There is a special relationship between the consonants "J, Q, X" and the consonant combinations "ZH, CH, SH." Their sounds are similar, but their is a difference in where your tongue is placed in your mouth while forming that sound, that creates a subtle difference in its pronunciation.

[J] - Pronounced similar to the "g" sound in "jeep." The "J, Q, X" sounds are not quite the same as their English counterparts. The sounds are produced by placing the tip of the tongue on your lower gum line, below your bottom front teeth, allowing the rest of the tongue to naturally push up towards the roof of your mouth, and then trying to produce the corresponding sound. If you were to try to say the word "jump" with your tongue in this position, it would sound understandable, but it should almost sound like you're speaking it with a lisp. It's similar to that "g" sound you're used to, but just not quite, and that's what you're looking for.

[Q] - Pronounced similar to the "ch" sound, like in "chip". In this instance, the "Q" is pronounced with the tongue in the same position describe in the "J" section. Your looking for that telltale "close but not quite the same" sound when replicating the "ch" to know you're on the right track.

[X] - Pronounced similar to the "sh" sound, like in "sheep." The last of the collection of three that are pronounced with the tip of the tongue pointed downwards, as described in the section for "J." Try pronouncing your "sheep" with the appropriate tongue positioning to get a feel for how different the sounds are from what you're used to.

Now with those out of the way, we get into their retroflex versions. What this means, is the following sounds are produced similar to the above three, however instead of placing the tip of your tongue at the bottom of your front teeth, you are bending your tongue back and upwards, touching the tip to your hard palate, a bit back from your upper front teeth. Retroflex literally means to bend back, and as such that's what your tongue does with these sounds.

[ZH] - Pronounced similar to the "ju" sound in "jump", however unlike with "J" the tip of your tongue should be touching the roof of your mouth, behind your front teeth, and on the hard palate. Again, this should produce a sound very similar to the "ju" sound you're used to, but not quite there. And slightly different from what you heard with the "J" sound.

[CH] - Pronounced similar to "ch" like in "chip", but like the others in this group, your tongue should be matching the position described above.

[SH] - Pronounced similar to the "sh" sound like in "sheep", but as it's part of this retroflex group, your tongue should match the position describe above to produce the correct sound. Remember to listen for that slight difference to know that you're getting there.

[R] - Pronounced similar to the "r" in "rough", this one is placed here because of how it relates to the positioning of your tongue like in the above examples. The "R" sound, when at the start of a word, is made with your tongue in that same upward position as the "ZH, CH, SH" sounds are.

With all of that said and done, there's one last part that always elicits a "but why...?" response when learning for the first time. And that is the relationship between "ZH, CH, SH" and the vowel "I". While practicing these "ZH, CH, SH" sounds, you may find that trying to say "ZHI" or the like can be quite awkward and difficult. While one may expect "ZHI" to be pronounced like "GEE" similar to how "JI" would be, this is not the case. For these retroflex versions of the sounds, when paired with an "I" they change the end of the word.

[ZH, CH, SH] + [I] - Simply put, when making these sounds, your tongue is already at the top of your mouth, making the "ee" sound difficult, so in the case of "ZHI, CHI, SHI" the I takes on an "R" sound. Your "Zhi" will sound similar to "jer" from "jerk", your "Chi" will sound similar to "chir" from "chirp", and your "Shi" will sound similar to "shir" form "shirt." That being said, the strength of that "R" sound can vary greatly depending on who the speaker is, to the point where it can seem almost nonexistent at times.

Above all, remember that these sound comparisons are all best approximations, and a starting point for you to get to where you want to be. Nailing the pronunciation is going to take time, and hours and hours of listening to native speakers from all over to get a comfortable feel, and when you're finally comfortable, you'll find some other sound that needs slight tweaking. That's okay though. Chinese is difficult, and you're already likely on track to being better at it than many people you know.

But we're also not quite done with pronunciation yet.


With one half of the equation down, we move on to the next topic. The pronunciations covered above seem like a lot, but will become second nature fairly quickly. Tones however, are something that you may find yourself struggling with for years. I've met many people studying who choose to ignore, or only lightly touch on tones, because after all you can still be understood, and understand, without quite grasping tones. That being said, poor tones can still lead to misunderstandings, unintended offense, and confusion when talking with native speakers. In my opinion, it's best to emphasis your understanding of tones early, and build good habits, otherwise breaking those old habits in the future can become quite obnoxious.

So what are tones? As mentioned previously in this blog, tones are simply shifts in the intonation of words that alter how that word is understood. Think of how English ends a question. With questions, you often hear a rising intonation at the end of a sentence, right? With a command, you often hear a falling intonation. The phrase "stop that" will come with a quick, curt, and short intonation on both words, to emphasize that this is a command and you should listen. This change in how something is said is the same core idea behind tones, just divorced of the meanings we typically associate with them, and on every word as opposed to the end of a sentence. With that cleared up, Chinese has more than just rising and falling tones. Mandarin Chinese in particular has five, and I will describe them, and their potential meanings, using the pinyin "Ma".

[First Tone] - Mā - Denoted by a solid line over the vowel. This tone is pronounced in a steady relatively high pitched tone. Think the last "La" in "Fa-la-la-la-la". An example of mā is the pronunciation for the character 妈, meaning "mother".

[Second Tone] - - Denoted by a upward rising line over the vowel. This tone is pronounced similar to mow you would end a question. While it sounds like an interrogative, this sound does not mark a question like it does in English. An example of is the pronunciation of the character 麻, which can refer to hemp or the general feeling of numbness among other things.

[Third Tone] - Mǎ - Denoted by a u/v shape over the vowel. This tone is a bit tricky for English speakers, but very much follows the path of the mark itself, as the others have. The third tone is one that will typically start high, then dips low to an almost guttural level, and ends by rising back up again. This will take some getting used to, but isn't too bad once you get the hang of it. An example of mǎ is the pronunciation for the character 马, meaning horse.

[Fourth Tone] - - Denoted by a downward descending line over the vowel. This tone follows the mark as well. Start high, end low, almost like the command tone in English, only not necessarily curt and quick, just descending. An example of is the pronunciation for the character 骂, meaning to scold or curse.

[Fifth/Neutral Tone] - Ma - As the name suggests, this is the fifth, and sometimes just called neutral, tone. There is no special intonation to the neutral tone, and is simply a flat sound. An example of this tone can actually be found in the word 妈妈, where the first Ma is pronounced Mā and the second Ma is pronounced toneless. This word means mother/mom/mama.

With that covered, it's important to note that there will be situations where these tones change. A Third Tone next to a Second Tone will be pronounced differently than a third tone on its own. However these exceptions will be covered in a later entry, so readers can have time to get used to the tones themselves.

Wrap Up and Ways Forward:

At this point, if I wrote this correctly, you should be feeling completely overwhelmed, but with just enough information to begin taking those first steps towards conquering that mountain that is the Chinese language. Know that it's normal to feel that way. By learning how to pronounce Chinese words correctly, you're already placing yourself in a good position to demonstrate your willingness to learn, and that alone is worth being proud of as not many people make it even this far. So what's next?

In the next lesson I plan to start introducing common words, phrases, and basic grammar structures. My recommendation, if you wish to continue, is to utilize flash cards, flash card apps, and color coding to reinforce lessons from this entry, and carry them over into your future studies. My personal method for color coding, and one that coincides with some common coloring schemes I've seen, are as follows:

[First Tone] - Red

[Second Tone] - Green

[Third Tone] - Blue

[Fourth Tone] - Purple

[Fifth/Neutral Tone] - Grey

When starting off, and creating flashcards, it is extremely helpful to write the characters on the front, in the appropriate colors coinciding with their tones, and on the back of the card write the pinyin and meanings. This will shift as you get more and more proficient, but starting to associate your characters with their tones early will create an incredibly strong foundation, and the color associations make this relatively easy.

I hope this lesson has given you a good enough understanding of the basics to start your journey into the depths of the Chinese language. Remember, learning a language is an endurance match, not a race, so hang in there and happy studies.


Recent Posts

See All


Post: Blog2_Post
bottom of page