Unicode Migration
Many developers are still confused by Unicode. Perhaps this is no surprise because it mostly works transparently in the background so there's no real reason to go deeper. However, some developers worry if Unicode could have a significant impact on their project, especially if it predates 2009.
What changes in practice
- Most legacy ANSI text is converted to Unicode implicitly by RAD Studio.
- The main area you need to review is string types and how they interact with external APIs and file I/O.
- Non-string data types are largely unaffected by the migration.
- If you are already on RAD Studio 2009 or later, you are almost certainly using Unicode today.
What is Unicode?
Unicode is a way of representing an extended set of characters including accented alphabetical letters, as well as things like emoticons, the ubiquitous smiley faces to things like diphthongs and icons. Unicode replaces the more limited ANSI character encoding.
Here's a brief and simple way to understand the practical implications of Unicode's effect on your project.
How does RAD Studio deal with Unicode?
For most cases of legacy code, Unicode conversion from the older ANSI encoding is implicit and that's really good news for your projects. RAD Studio automatically handles the conversion of ANSI characters and maps them to Unicode. This means, as a developer, there's often very little for you to do in most cases.
Understanding Unicode mainly means knowing how it works within string types. All you have to do is know the changes for strings. The rest of the data types are not affected by it at all.
If your application isn't multilingual and uses simple ANSI strings for displaying labels, text, messages, and data, it is unlikely you will run into any major issues. Again, all this is because of RAD Studio's implicit handling.
If your application is already migrated and working with RAD Studio 2009 or later then you are probably already using Unicode even if you were not aware of it.
Finally, it's been over 15 years since Unicode was implemented in RAD Studio. The compiler and libraries are mature and handle the majority of tasks, from the simple to the complex that may occur whilst interacting with other 3rd-party systems that communicate in Unicode-encoded data.
Helpful articles on Unicode
Delphi Unicode Migration — Cary Jensen White Paper
Start from the beginning for the history, or jump straight to page 6 for the implementation details.
www.embarcadero.com
Delphi and Unicode — Marco Cantù
An in-depth look at Unicode in Delphi covering string types, encoding, and practical migration guidance.
d2ohlsp9gwqc7h.cloudfront.net
Enabling Applications for Unicode
A practical checklist of pitfalls to look out for in legacy applications.
docwiki.embarcadero.com
Unicode in RAD Studio
Under-the-hood details when you really need to understand the implementation.
docwiki.embarcadero.com
Tools for Unicode Migration
Embarcadero has a tool that scans your existing legacy code and highlights possible places you may want to review for Unicode compatibility. The Unicode Statistics Tool helps you estimate the time and effort needed to migrate by listing:
- All used units (including Delphi units) and how many times each one is used.
- Number of files and lines of code.
- Instances of key tokens such as
String,Read,Write,SizeOf, and more.
Unicode Statistics Tool
Download and scan your legacy codebase to estimate Unicode migration effort and identify hotspots.
drive.google.com
Unicode Migration Videos
https://www.youtube.com/watch?v=7EnKXZ9-Pc8
Overall migration webinar. For the Unicode-specific section, jump to 00:15:17–00:42:19 to see the Unicode Statistics Tool in action and common surprises you may encounter.
https://www.youtube.com/watch?v=QgSiQiE8lKg&t=2s
Unicode migration walkthrough for C++Builder projects, covering the key string type and API changes when moving C++ codebases to Unicode.
