大小写造成的悲剧——Java中的周年

2020/11/13 java

2020年变成了2021年yyyy-MM-ddYYYY-MM-dd的背后原来是这么回事


背景

上次说到 Java中的夏令时,其中的示例代码还有一个经典的格式错误用法,那就是小写的y和大写的Y

语义区别

YYYY-MM-ddyyyy-MM-dd是不一样的,我们通常用的是yyyy-MM-dd

yY的语义区别:

  • y表示的是普通的year
  • Y表示的是week year,周年,表示当天所在的周所在的年份,只要本周跨年,那么这周就属于下一年
Letter Date or Time Component Presentation Examples
y Year Year 1996; 96
Y Week year Year 2009; 09

用周年通常使用格式Y-wY表示周年,w表示当前周年的第几个周

参考:JDK1.8的SimpleDateFormat文档

什么是周年?

某些场景下,我们为一年中的每个星期分配一个数字,这样就能把年按照周的方式划分而不是按月的方式划分

但在不同的标准中有不同的定义

在跨年周中,周年的第一周有这样等效、兼容的定义:

  • 第一个工作日所在的周
  • 第一个星期四所在的周
  • 1月4号所在的周
  • 大部分在1月份的周,四天及其以上
  • 从12月29号——1月4号期间星期一所在的周

因此,如果 1 月 1 日在星期一、星期二、星期三或星期四,则在第 01 周。如果 1 月 1 日在星期五、星期六或星期日,则在上一年的第 52 或 53 周(有没有第 00 周)。12 月 28 日总是一年中的最后一周。

在ISO-8601的标准中,ISO周从第一周的星期一开始

周年有52或53个整周,每个整周是7天,所以一个周年有364或371天

参考:ISO-8601#week_dates

例子

举个例子,2020-12-312020年的最后一天是周四,属于跨年的周

使用yyyy-MM-dd格式,输出2020-12-31

        // 2020-12-31
        Calendar calendar = Calendar.getInstance();
        calendar.set(2020, Calendar.DECEMBER, 31);
        Date date = calendar.getTime();

        SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd");
        String dateStr = format.format(date);
        System.out.println(dateStr);
        // 2020-12-31

使用YYYY-MM-dd格式,却输出2021-12-31

        // 2020-12-31
        Calendar calendar = Calendar.getInstance();
        calendar.set(2020, Calendar.DECEMBER, 31);
        Date date = calendar.getTime();

        SimpleDateFormat format = new SimpleDateFormat("YYYY-MM-dd");
        String dateStr = format.format(date);
        System.out.println(dateStr);
        // 2021-12-31

当然,使用java.time.LocalDate也是如此,使用YYYY-MM-dd格式,输出2021-12-31

        // 2020-12-31
        LocalDate date = LocalDate.of(2020, 12, 31);
        DateTimeFormatter formatter = DateTimeFormatter.ofPattern("YYYY-MM-dd");
        String dateStr = date.format(formatter);
        System.out.println(dateStr);
        // 2021-12-31

他们都使用了相同的标准:ISO-8601

但此时发现了一个问题,如果使用YYYY-MM-dd格式来解析2020-12-27(周日),却输出2021-12-27

        // 2020-12-27
        LocalDate date = LocalDate.of(2020, 12, 27);
        DateTimeFormatter formatter = DateTimeFormatter.ofPattern("YYYY-MM-dd");
        String dateStr = date.format(formatter);
        System.out.println(dateStr);
        // 2021-12-27

从上文按照ISO-8601标准:一周应该是从周一开始的

周日这一天应该属于2020周年的最后一周,而不属于跨年周呀?不应该输出2020-12-27吗?

原来,这和java.util.Locale也有关:

  • 如果在美洲——美国,一周是从周日开始的,这时解析2020-12-27(周日)会输出2021-12-27

            // 2020-12-27
            LocalDate date = LocalDate.of(2020, 12, 27);
            DateTimeFormatter formatter = DateTimeFormatter.ofPattern("YYYY-MM-dd", Locale.US);
            String dateStr = date.format(formatter);
            System.out.println(dateStr);
            // 2021-12-27
    
  • 如果在欧洲——英国,一周是从周一开始的,这时解析2020-12-27(周日)会输出2020-12-27

            // 2020-12-27
            LocalDate date = LocalDate.of(2020, 12, 27);
            DateTimeFormatter formatter = DateTimeFormatter.ofPattern("YYYY-MM-dd", Locale.UK);
            String dateStr = date.format(formatter);
            System.out.println(dateStr);
            // 2020-12-27
    
  • 如果是阿拉伯国家——阿富汗,一周是从周六开始的,这时解析2020-12-26(周六)会输出2021-12-26

            // 2020-12-26
            LocalDate date = LocalDate.of(2020, 12, 26);
            DateTimeFormatter formatter = DateTimeFormatter.ofPattern("YYYY-MM-dd", Locale.forLanguageTag("ar-AF"));
            String dateStr = date.format(formatter);
            System.out.println(dateStr);
            // 2021-12-26
    

甚至在不同的JDK版本中,由于默认值的设定,周年表现的也不一样(主要是一周到底是从哪一天开始)

可参考stackoverflow:在JVM不同版本中WeekFields的不同表现

源代码分析

我们可以通过源代码分析2020年是怎么变成2021年的

正常情况下对yyyy-MM-dd的解析,在解析y的时候,取到的TemporalFieldYearOfEra,表示常规年份

java.time.format.DateTimeFormatterBuilder#parseField

image-20220417144917612

这是Java8文档中关于yearOfEra的官方定义:

    /**
     * The year within the era.
     * <p>
     * This represents the concept of the year within the era.
     * This field is typically used with {@link #ERA}.
     * <p>
     * The standard mental model for a date is based on three concepts - year, month and day.
     * These map onto the {@code YEAR}, {@code MONTH_OF_YEAR} and {@code DAY_OF_MONTH} fields.
     * Note that there is no reference to eras.
     * The full model for a date requires four concepts - era, year, month and day. These map onto
     * the {@code ERA}, {@code YEAR_OF_ERA}, {@code MONTH_OF_YEAR} and {@code DAY_OF_MONTH} fields.
     * Whether this field or {@code YEAR} is used depends on which mental model is being used.
     * See {@link ChronoLocalDate} for more discussion on this topic.
     * <p>
     * In the default ISO calendar system, there are two eras defined, 'BCE' and 'CE'.
     * The era 'CE' is the one currently in use and year-of-era runs from 1 to the maximum value.
     * The era 'BCE' is the previous era, and the year-of-era runs backwards.
     * <p>
     * For example, subtracting a year each time yield the following:<br>
     * - year-proleptic 2  = 'CE' year-of-era 2<br>
     * - year-proleptic 1  = 'CE' year-of-era 1<br>
     * - year-proleptic 0  = 'BCE' year-of-era 1<br>
     * - year-proleptic -1 = 'BCE' year-of-era 2<br>
     * <p>
     * Note that the ISO-8601 standard does not actually define eras.
     * Note also that the ISO eras do not align with the well-known AD/BC eras due to the
     * change between the Julian and Gregorian calendar systems.
     * <p>
     * Non-ISO calendar systems should implement this field using the most recognized
     * year-of-era value for users of the calendar system.
     * Since most calendar systems have only two eras, the year-of-era numbering approach
     * will typically be the same as that used by the ISO calendar system.
     * The year-of-era value should typically always be positive, however this is not required.
     */
    YEAR_OF_ERA("YearOfEra", YEARS, FOREVER, ValueRange.of(1, Year.MAX_VALUE, Year.MAX_VALUE + 1)),

YYYY-MM-dd的解析,在解析Y的时候,新建了一个基于周年的属性解析器WeekBasedFieldPrinterParser

java.time.format.DateTimeFormatterBuilder#parsePattern

image-20220417145919501

这是Java8文档中关于WeekBasedFieldPrinterParser类的定义,可以看到确实依赖了当前的语言环境locale:

    /**
     * Prints or parses a localized pattern from a localized field.
     * The specific formatter and parameters is not selected until the
     * the field is to be printed or parsed.
     * The locale is needed to select the proper WeekFields from which
     * the field for day-of-week, week-of-month, or week-of-year is selected.
     */
    static final class WeekBasedFieldPrinterParser implements DateTimePrinterParser {

在解析了时间格式之后,计算年份的时候

关键代码定位:java.time.temporal.WeekFields.ComputedDayOfField#localizedWeekBasedYear

image-20220412212320193

可以看到,在计算年份的时候,首先计算当天在当前年的周数:

  • 如果是0,表示当天是在去年最后一周的,则year-1
  • 如果不是0,则看当前周是不是属于跨越周,如果属于跨越周,则year+1

在这段代码里,有一段关键逻辑,根据参考日计算当前年的周年数

int newYearWeek = computeWeek(offset, yearLen + weekDef.getMinimalDaysInFirstWeek());

定位java.time.temporal.WeekFields.ComputedDayOfField#computeWeek

image-20220427125643148

其中的offset表示当前day与第一个完整的周开始的那天的偏移量

offset计算方式定位java.time.temporal.WeekFields.ComputedDayOfField#startOfWeekOffset

image-20220427130053617