为什么 Char.IsDigit 对于无法解析为 int 的字符返回 true?
- 作者: 正宗卖男孩的小火柴
- 来源: 51数据库
- 2023-02-13
问题描述
我经常使用字符.IsDigit 来检查 char 是否是一个数字,这在 LINQ 查询中特别方便以预先检查 int.Parse 如下:"123".All(Char.IsDigit).
但是有些字符是数字,但不能像 ? 那样解析为 int.
//真bool isDigit = Char.IsDigit('?');var文化 = CultureInfo.GetCultures(CultureTypes.SpecificCultures);整数;//错误的bool isIntForAnyCulture = 文化.Any(c => int.TryParse('?'.ToString(), NumberStyles.Any, c, out num));
这是为什么?我的 int.Parse-通过 Char.IsDigit 进行预检查是否不正确?
有 310 个字符是数字:
ListdigitList = Enumerable.Range(0, UInt16.MaxValue).Select(i => Convert.ToChar(i)).Where(c => Char.IsDigit(c)).ToList();
以下是 .NET 4 (ILSpy) 中 Char.IsDigit 的实现:
public static bool IsDigit(char c){如果 (char.IsLatin1(c)){返回 c >= '0' &&c <= '9';}返回 CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.DecimalDigitNumber;}
那么为什么会有属于 DecimalDigitNumber-category("十进制数字字符,即 0 到 9 范围内的字符...")在任何文化中都不会被解析为 int 吗?
这是因为它正在检查 Unicode数字,十进制数字"类别中的所有数字,如下所列:
http://www.fileformat.info/info/unicode/类别/Nd/list.htm
这并不意味着它是当前语言环境中的有效数字字符.事实上,使用int.Parse(),你只能解析正常的英文数字,??而不管区域设置如何.
例如,这不起作用:
int test = int.Parse("?", CultureInfo.GetCultureInfo("ar"));
即使 ? 是有效的阿拉伯数字字符,并且ar"是阿拉伯语区域设置标识符.
Microsoft 文章 如何:解析 Unicode 数字" 指出那个:
<块引用><块引用>.NET Framework 解析为十进制的唯一 Unicode 数字是 ASCII 数字 0 到 9,由代码值 U+0030 到 U+0039 指定..NET Framework 将所有其他 Unicode 数字解析为字符.
但是,请注意,您可以使用 char.GetNumericValue() 将 unicode 数字字符转换为双精度数字.
返回值是 double 而不是 int 的原因是这样的:
Console.WriteLine(char.GetNumericValue('?'));//打印 0.25
您可以使用类似的方法将字符串中的所有数字字符转换为它们的 ASCII 等价物:
public string ConvertNumericChars(string input){StringBuilder 输出 = new StringBuilder();foreach(输入中的字符ch){如果 (char.IsDigit(ch)){双值 = char.GetNumericValue(ch);if ((value >= 0) && (value <= 9) && (value == (int)value)){output.Append((char)('0'+(int)value));继续;}}output.Append(ch);}返回 output.ToString();}
I often use Char.IsDigit to check if a char is a digit which is especially handy in LINQ queries to pre-check int.Parse as here: "123".All(Char.IsDigit).
But there are chars which are digits but which can't be parsed to int like ?.
// true bool isDigit = Char.IsDigit('?'); var cultures = CultureInfo.GetCultures(CultureTypes.SpecificCultures); int num; // false bool isIntForAnyCulture = cultures .Any(c => int.TryParse('?'.ToString(), NumberStyles.Any, c, out num));
Why is that? Is my int.Parse-precheck via Char.IsDigit thus incorrect?
There are 310 chars which are digits:
List<char> digitList = Enumerable.Range(0, UInt16.MaxValue) .Select(i => Convert.ToChar(i)) .Where(c => Char.IsDigit(c)) .ToList();
Here's the implementation of Char.IsDigit in .NET 4 (ILSpy):
public static bool IsDigit(char c) { if (char.IsLatin1(c)) { return c >= '0' && c <= '9'; } return CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.DecimalDigitNumber; }
So why are there chars that belong to the DecimalDigitNumber-category("Decimal digit character, that is, a character in the range 0 through 9...") which can't be parsed to an int in any culture?
It's because it is checking for all digits in the Unicode "Number, Decimal Digit" category, as listed here:
http://www.fileformat.info/info/unicode/category/Nd/list.htm
It doesn't mean that it is a valid numeric character in the current locale. In fact using int.Parse(), you can ONLY parse the normal English digits, regardless of the locale setting.
For example, this doesn't work:
int test = int.Parse("?", CultureInfo.GetCultureInfo("ar"));
Even though ? is a valid Arabic digit character, and "ar" is the Arabic locale identifier.
The Microsoft article "How to: Parse Unicode Digits" states that:
The only Unicode digits that the .NET Framework parses as decimals are the ASCII digits 0 through 9, specified by the code values U+0030 through U+0039. The .NET Framework parses all other Unicode digits as characters.
However, note that you can use char.GetNumericValue() to convert a unicode numeric character to its numeric equivalent as a double.
The reason the return value is a double and not an int is because of things like this:
Console.WriteLine(char.GetNumericValue('?')); // Prints 0.25
You could use something like this to convert all numeric characters in a string into their ASCII equivalent:
public string ConvertNumericChars(string input) { StringBuilder output = new StringBuilder(); foreach (char ch in input) { if (char.IsDigit(ch)) { double value = char.GetNumericValue(ch); if ((value >= 0) && (value <= 9) && (value == (int)value)) { output.Append((char)('0'+(int)value)); continue; } } output.Append(ch); } return output.ToString(); }
- C#通过fleck实现wss协议的WebSocket多人Web实时聊天(附源码)
- 团队城市未满足要求:MSBuildTools12.0_x86_Path 存在
- 使用 MSBuild.exe 在发布模式下构建 C# 解决方案
- 当我发布 Web 应用程序时,AfterPublish 脚本不运行
- 构建时 T4 转换的产品仅在下一个构建中使用
- ASP.NET Core Application (.NET Framework) for Windows x64 only error in project.assets.json
- 新的 .csproj 格式 - 如何将整个目录指定为“链接文件"到子目录?
- 如何将条件编译符号(DefineConstants)传递给 msbuild
- MSBuild 支持 Visual Studio 2017 RTM 中的 T4 模板
- NuGet 包还原找不到包,没有源