In the context of the Banque de France’s monthly business survey, this document presents the main findings of the textual analysis of business leaders’ comments. First, the richness of these data is illustrated via an elementary sentiment index and the identification of the main social movements since 2009 by means of keywords. Then, the article presents two statistical applications whose reproducibility is discussed. The first one, applied to the 2018 yellow vests and the 2019 strikes, aims to estimate the impact on GDP of an event whose effect is unequivocal. The second, backed by the study of Brexit, aims to characterize, using a supervised learning model and word vectors, the effects of a complex event with multiple impacts.
The monthly business survey is conducted by means of a semi-structured telephone interview and is transcribed in the form of both opinion scales and a summary of the information provided during the interview in text form. These comments make it possible to clarify the answers with contextual elements useful for the economic analysis (figures, factual information on the company's current situation and on its markets). This working paper provides an update on the main results obtained via the exploitation of this complementary textual data, since the development of this axis of analysis in September 2018. A corpus of more than 500,000 documents has thus been built up, corresponding to the declarations of companies to the industry, market services and construction surveys, collected between 2009 and 2021.
The richness and information content of this data is first illustrated through the construction of a textual sentiment index (SI). Constructed on the basis of a dictionary of polarized words (positive, neutral, negative), the SI shows correlations with the business climate index (BCI) of between 0.6 and 0.9, depending on the survey and the method of calculation. This result indicates a strong capacity of textual data to recompose quantitative information from the opinion balances, and suggests a possible usefulness of the SI for GDP nowcasting. The article then presents a keyword identification method that allows us to trace the occurrences of certain themes in comments over time. Its relevance is demonstrated in the context of the study of the various social movements that have occurred over the last ten years (see graph), and whose relative importance in the concerns of the companies surveyed can be compared using this method.
Such a count can then be derived into an econometric calculation to estimate the impact on GDP. By way of illustration, 20% of companies would thus have seen their activity disrupted by the yellow vest movement in November and December 2018, leading to an estimated loss of -0.15 points of GDP in Q4 2018. This valuation method can be easily mobilized for any other application that comes down to the study of two groups of firms, affected and not affected by an event.
A more complex case that does not correspond to this dichotomous partition is developed in the last part of the article and applied to assess the effects of the Brexit. From the restriction of the corpus to documents evoking the Brexit, we identify 6 non-exclusive categories of corporate reactions. After labeling and projecting the documents into a word2vec lexical embedding space, a logit-lasso model is trained. The latter's predictions allow us to trace in a quantified way the chronology and nature of French companies' concerns regarding the Brexit since 2016. In particular, this analysis highlights a first peak in concerns immediately after the referendum in the summer of 2016, characterized by currency effects related to the depreciation of the pound. Subsequently, two other peaks in concerns, in the spring and fall of 2019, correspond to the successively extended deadlines for the UK's exit from the European Union: due to fears of a border closure, these periods are characterized by a high level of uncertainty and supply and demand adjustments on both sides of the Channel. With the adaptation of the typology and labeling, this methodology can be reproduced for the analysis of a future event where the nature of the impact is multiple and where there is a need to monitor the evolution and magnitude of the effects over time.
Updated on: 07/02/2021 14:11