Home > 19 (3), 8

A Sociologist's Field Notes to the Mass Observation Archive: A Consideration of the Challenges of 're-Using' Mass Observation Data in a Longitudinal Mixed-Methods Study

by Rose Lindsey and Sarah Bulloch
University of Southampton; University of Southampton

Sociological Research Online, 19 (3), 8
<http://www.socresonline.org.uk/19/3/8.html>
DOI: 10.5153/sro.3362

Received: 20 Sep 2013 | Accepted: 25 Jun 2014 | Published: 15 Aug 2014


Abstract

This paper explores the challenges arising from the 're-use' of Mass Observation Project (MOP) writing (1981 to present day) encountered by the authors when setting up an Economic and Social Research Council (ESRC) funded, longitudinal, mixed-methods research project on civic engagement. The paper begins with a brief review of the present UK social science research environment, highlighting the evidence for an increasing Research Council focus on interdisciplinary research and secondary analysis/re-use of data. It argues that this shift in focus gives rise to unique methodological challenges such as those encountered by the authors in this project. After providing some background and context, the paper discusses different obstacles encountered in the course of setting up this project. These include difficulties in: communicating within and across disciplines; re-using data across disciplines; the use of metadata, and its role in choosing writers from a longitudinal secondary data source; choice of analytical tools and approaches; and the Mass Observation writer's role in the research process. By sharing these experiences, the paper seeks to enable potential users of the MOP to see the value of MOP as a source of longitudinal qualitative secondary data; appreciate its potential for use with other data sources and across different disciplines; and equip other researchers to meet some of the challenges that the longitudinal use of MOP writing throws up.

Keywords: Mass Observation Project, Secondary Data Analysis, Interdisciplinary, Mixed-Methods, Longitudinal

Introduction

1.1 There is increasing emphasis within the social science community on the use of secondary data and on interdisciplinary working. These shifts are reflected in drivers such as the Research Councils' 'impact agenda' and reduced public funds for primary research. The coming together of interdisciplinary working and secondary data agendas presents particular methodological challenges to the research community. This paper contextualises these challenges by providing evidence drawn from the practical experience of using Mass Observation Project (MOP) data when setting up a longitudinal, mixed-method, research project on civic engagement.

1.2 The paper provides a review of the present UK social science research environment, highlighting the evidence for an increasing focus on interdisciplinary research based on secondary data. It argues that this focus presents a range of obstacles for researchers, and draws on the authors' experience of setting up a research project funded under the ESRC's Secondary Data Analysis Initiative Phase 1 call, in which some of these difficulties have been faced.

1.3 Themes that are identified as challenges for the secondary data analyst include communicating effectively with other users of MOP writing, and bridging boundaries in the language that is used to talk about this 'data'; adapting to the infrastructure that surrounds the MOP data and adding on/contributing to this; deciding on an appropriate way of choosing individual MOP writers to follow in a project; implementing a chosen analytical approach that uses distinct analytical tools; and conducting secondary analysis whilst also retaining a sense of the way in which the data were produced.

1.4 In sharing our experience of setting up our ongoing research project, we seek not only to highlight the value of MOP as a source of longitudinal qualitative secondary data that can be used in conjunction with other data, and across different disciplines, but to equip other researchers to meet some of the challenges that using the MOP throws up. We also seek to open up an academic debate on method and the longitudinal 'reuse' of MOP writing. The paper concludes with reflections on ways forward for the steadily expanding community of MOP users and archivists in addressing the challenges outlined.

The current UK social research environment

Interdisciplinary research

2.1 In its role as major funder of high quality social research in the UK, the ESRC explicitly highlights interdisciplinarity, alongside innovation and impact, as a desirable element of prospective research projects that it will support[1]. However, the Research Council is less explicit about what it means by this term. Tait and Lyall (2007) define interdisciplinary research as 'occurring where the contributions of the various disciplines are integrated to provide holistic or systemic outcomes' (p.1). This is in contrast to multidisciplinary research 'where each discipline works in a self-contained manner with little cross-fertilization among disciplines or synergy in the outcomes' (p.1). Griffin et al 2006, also view interdisciplinarity as an integrative process 'where elements from different disciplines are integrated, in a crossing of traditional disciplinary lines without an aim to challenge the borders of the disciplines'. However, interdisciplinarity can also adopt a more a critical position, striving to push the borders of disciplines.

2.2 There has been a gradual, growing focus on funding interdisciplinary research over the last 20 years. The European Commission's Framework Projects began promoting interdisciplinary research in the 1990's (Hicks and Katz 1996; Bruce et al 2004); and since 2001 the ESRC, and its arts and humanities counterpart the Arts and Humanities Research Council (AHRC), have run cross-council interdisciplinary research programmes (Griffin et al 2006).

2.3 However, despite over 20 years of funding-focus on interdisciplinary working, this approach still gives rise to a unique set of difficulties, which include difficulty in: winning funding in the first place, ensuring genuine integration rather than individual work in disciplinary silos; publishing outcomes within discipline-specific journals; career progression for interdisciplinary researchers (exacerbated by the requirement for all publications to fit under the remit of a discipline-specific 'unit' for the Research Excellence Framework); and ensuring findings and outputs are accessible to academics from other disciplines (Crow et al 2011).

2.4 The MOP has traditionally been used in a multidisciplinary way, with different disciplines across the social sciences and humanities using this resource separately. Recently, however, a growing group of researchers have been interested in the MOP as a rich data source, and the archivists have engaged in a range of initiatives to bring together users from across the different disciplines.[2] Yet this traffic around the archive is only slowly leading to a shared, accessible and interdisciplinary scholarship. Throughout this paper we highlight examples of how this increased interest in the archive across disciplines is difficult to translate into consensual interdisciplinarity.

Re-use of data

2.5 Re-use of data is the 'bread and butter' of the historian' (Langhamer, 2008, p.1); but has not been as common-place across the social sciences, particularly not within work employing qualitative methods. Whilst the concept of re-using data has been embodied and promoted through the UK Data Archive since 1977, and its qualitative archive 'Qualidata' which was formally set up in 1994 (and then subsumed within the UK Data Service in 2013), the promotion of the use of secondary data is a more recent trend amongst funders. Examples of recent funding initiatives include the ESRC's Secondary Data Analysis Initiatives (£8.2million for Phase 1, Phase 2 estimated £5million[3]) and the ESRC's Big Data Investment (£64 million[4]). This trend is echoed in Government, with the Office of National Statistics' exploration of alternatives to the current Census that include combining existing administrative data sources. It seems likely that the trend of funding re-use of existing data resources, rather than the creation of new data is likely to persist in an austerity-led funding environment.

2.6 Do these funding initiatives impact on the levels of re-use of different types of data within the social sciences? Mason, writing in 2007 expressed concern that 'qualitative resources and data come to be seen as the poor relation to quantitative survey data, and certainly they continue to attract relatively speaking only a small proportion of the available funds' (section 2.3). This trend for quantitative led funding initiatives may also be reflected beyond the Research Council. In a recent circular email to its research community the Joseph Rowntree Foundation recently announced that research commissioned for its Poverty and Ethnicity programme would now 'mainly focus on quantitative research'[5].

2.7 Is this focus on quantitative data over qualitative data reflected in researchers' responses to funding calls for data reuse? The authors note that in Phase 1 of the recent ESRC Secondary Data Analysis Initiative, only 3 out of 58 of the projects funded opted to re-use qualitative data, and all three were mixed-method projects, thus also re-using quantitative data. The reasons for this weighting towards quantitative data are unclear. Was this a result of researcher-led bias, leading to few applications for re-use of qualitative data? Was it a reflection of reviewers' bias towards quantitative data? Or was it a logistical problem with researchers identifying that since it can take longer to analyse discrete bodies of qualitative data than it can quantitative data, the funding and time available for the Phase 1 initiative did not lend itself to qualitative projects[6]?

2.8 The process of re-using qualitative data also comes with a unique set of problems and debates, including a concern that data will be reused in ways that the original researchers had not intended. There is already a body of literature on the issue of reusing qualitative data, see for example, Moore (2007), Mason (2007), Irwin (2010), Irwin and Winterton (2012a, 2012b)[7]. The concerns of this paper relate to the potential re-use of MOP data for longitudinal analysis within the social sciences and across other disciplines.

Background to the project

3.1 The MOP represents a key national, qualitative, secondary data resource. Since 1981, a national panel of volunteer writers has contributed to the MOP in response to themed questions or 'directives' that are sent to them three times a year. Their responses, physically stored at The Keep in Brighton[8], provide a rich account of individuals' activities and attitudes towards a range of topics over time. Although the MOP represents a unique source of longitudinal data, offering potential insight into changes and continuities in individual writers' lives across time, academics have tended to use it thematically and cross-sectionally, focusing on responses to a given theme at given points in time.

3.2 In 2012 the ESRC advertised its intention to fund its Secondary Data Analysis Initiative Phase 1, inviting proposals for 'small scale research projects exploiting existing major data resources created either by the ESRC and other agencies.'[9] We were successful in gaining funding for a mixed-method, longitudinal project (1981-2012) on civic engagement, re-using qualitative data drawn from writing from the MOP, and quantitative data drawn from longitudinal and cross-sectional surveys including Understanding Society/British Household Panel Survey, the General Lifestyle Survey, and the British Social Attitudes Survey.

3.3 The research project, which explores individual attitudes and behaviours towards volunteering, and individual views on the role and responsibility of the State, across time, is the first to use the MOP as a longitudinal qualitative data source. At the time of writing, we are in the process of analysing 38 individual MOP writers' responses to 15 different directives across 1981-2012 and bringing this into an iterative analytical dialogue with the quantitative longitudinal data sources.

3.4 This paper adopts a broadly chronological approach to the challenges that have emerged when setting up the project, during the process of reusing MOP writing, and when engaging with the interdisciplinary nature of the archive.

Communicating within and across disciplines

4.1 During a recent conference held by the Mass Observation Archive (MOA)[10], at which we presented a paper there was considerable discussion of the terminologies and methodologies being used by the different disciplines that are researching the MOA. Although there was cross-disciplinary interest in the substantive themes of individual papers, some debates arose between historians and sociologists. These focused on sociological terminology, and the sociological rendering of rich, emotive, individual 'writing' or 'material' into the 'rational' 'scientific' language of 'data', performing the observations on cross-disciplinary language and accessibility made by Crow et al in their paper of 2011.

4.2 We were interested in the way in which the language and terminology of the different disciplines led to such quick disciplinary distancing and sense of division. Given the shared interest of these disciplines in the MOA, and in the findings of each other's work, this entrenchment seems illogical. In response to this observation, Rose (one of the authors) has subsequently worked in partnership with the MOA to bring together a group of academics from three different disciplines to develop a cluster of interdisciplinary projects (historians, sociologists and political scientists working together on the same projects). These projects aim to work with MOA data/material in a variety of different ways. Approaches include the reuse of early and late archival material; longitudinal analysis of MOP writing; and bringing MOP writing into dialogue with other data sources. A methodologist is also working with the cluster to explore the boundaries between different disciplinary methods, terminologies, and language. During the process of developing this proposal, negotiation skills, open minds, and a commitment to translating different analytical cultures and languages was required. If successful in achieving funding, the cluster has the potential to result in a genuine interdisciplinary dialogue of ideas, terminologies and methods clustered around the use of MOA data/material.[11]

4.3 With this issue of disciplinary translation in mind, although the target audience of this special issue of Sociological Research Online is sociologists, we hope that the issue reaches existing and potential MOA users beyond the discipline of sociology. We aim to ensure that this paper is linguistically accessible to non-sociologists, and that the challenges described below are of relevance to a mixed disciplinary audience.

Reusing MOP across disciplines and the role of meta-data

Missing meta-data

5.1 To date the vast majority of the MOP writing is in hard copy format stored at The Keep[12]. Only a very small proportion of the writing can be accessed electronically, and permission for access to electronic material is negotiated through the archivists[13]. The fact that the writing is largely in hard copy, sets challenges for researchers seeking to design research projects. Without physically visiting the archive, it is difficult to gain a sense of the nature, size and volume of the archive's holdings, which can make the choice of which writing to examine, very difficult.

5.2 Despite the fact that MOP writing has been re-used and re-analysed by multiple researchers over time, we found that the archive did not easily provide as much meta-data (information that describes the written material, what it contains, and who has written it) as we needed in the planning of our project. We have not been able to gauge how much of a problem this lack of information has presented to other researchers from the various different disciplines that use the MOP. Discussions with the archivists who were able to recall other methodologies used, suggested that our approach to the archive, and our focus on meta-data, may be more overtly methodical than the approaches of some other social science researchers and researchers from other disciplines.

5.3 The relative lack of available, accessible meta-data meant that during the process of identifying what material we wanted to look at, we had to create our own meta-data whilst being aware that we may have been replicating the work of researchers who had visited the archive before us.

Currently available meta-data

5.4 At the time of writing, the University of Sussex Special Collections website[14] offers the opportunity for an 'advanced search' of the MOA catalogue and the people that have written for the MOP, through a search engine that was built in 2010[15]. However, in 2012 when we were designing our research project, glitches with the website meant that the search engine was often unavailable, requiring us to turn to the archivists for help.

5.5 Part of the website offers a list of the themes of all the different directives that have been sent out to writers since 1981. This gives information as to the year, and time of year that individual directives were sent out, and provides a pdf copy of each directive, enabling the researcher to examine the exact wording of a directive. However, the archive does not have a sophisticated working database of information on recurring themes across directives. Nor does the existing search engine allow for searches across the content of all directives. Each researcher has to read through all the directives, and talk to the archivists, in order to identify cross-directive themes. Identifying the directives that were relevant to the substantive aims of our project was a lengthy process. The archivists were extremely helpful in pointing us in the right direction. This led us to realise that the knowledge and expertise that the archivists hold actually forms part of the meta-data that is available about the MOP, albeit data that is only accessible through informal channels and through developing relationships with the MOP's curators.

5.6 When the archive's online database is functioning, it is possible to identify which writers have responded to different directives, and to search for a particular anonymised individual (for example, A883), to identify which directives that person has responded to (although the list provided is not in chronological order). However, the format in which these pieces of information exist does not allow the researcher to easily make comparisons of the response rates of different writers, or to gain an overview of all, or groups of, the MOP writers.

5.7 Given the planned longitudinal focus of the work we were proposing, this gap in the archive's meta-data was frustrating. We needed to identify which individuals had responded to the directives that were substantively relevant to our study, and to pull out those writers who fulfilled our sampling criteria (that is, those writers that we wanted to select because they had certain characteristics, such as length of time they had been writing for the MOP, see section below). Ideally, we needed a table that set all the directives against a list of all the writers who had ever written for the archive. Although the archivists were able to provide us with data on writers, this was in a format that then called for significant work before the relevant details were readily accessible. Since creating a database that met our needs, we have been asked for, and supplied copies to two other researchers (both of whom are historians); we have also heard that the archivists have used some of our findings to provide another researcher with information on writers. It is clear that there is an appetite for this type of information amongst the research community.

5.8 In summary, the needs of our project meant that the meta-data available through the archive was relevant but not formatted in a user-friendly way. It is possible that the demand for this kind of information has not been visible until recently; perhaps due to the fact that a large proportion of researchers using the archive have done so by focusing on responses to a given theme at given points in time (a cross-sectional approach) rather than by tracing the same writers across a range of themes over time (a longitudinal approach). It is also possible that each individual researcher has created, and held on to, their own meta-data, without making this available to other researchers. Since the start of our research project we have heard of several other academics (from different disciplines) interested in using the MOP for longitudinal work. The provision of accessible meta-data would support these individuals in their work, as well as encourage others to consider using the MOP as a source of data for secondary analysis, either cross-sectional or longitudinal, when planning a research project.

5.9 Reflecting on this experience in the context of the current funding environment and the focus on re-use of data, the investment in accessible meta-data appears a worthwhile one (the archivists at the MOA are very aware of the shortcomings of the currently available metadata). Typically, scoping work takes place prior to the awarding of funding for a project, which means that ease and cost-effectiveness of access to meta-data is a crucial consideration. Additionally, the research community and the archive would benefit from effective meta-data collection and collation, and more effective online search engines, to ensure that researchers using the archive are not wasting time and Research Council money by constantly repeating the same processes of collection, collation and analysis of the same pieces of meta-data. Having identified this need, some of the project team have worked in partnership with the MOA and researchers from the Universities of Surrey and Birmingham to put together a 'metadata project' proposal to the ESRC, through the Secondary Data Analysis phase II initiative. The project aims to bring together quantitative and qualitative anonymised MOA metadata into one searchable, online database, enabling current and prospective MOP users to access this information. We are waiting on news as to whether the project will be funded. In the meantime, it would be useful if the MOP researcher community, which exists across a range of disciplines, saw fit to share its meta-data with the archive, so that this can be made accessible to other researchers in that community.

Using metadata to select MOP writers for our study

5.10 The group of MOP writers participating between the early 1980's and the present day has varied in size over time, with the archive hosting as many as 1000 volunteer writers in its early days, when the archive accepted all volunteers offering to write.[16] The archive can provide anonymised basic profile information on its writers (although this is not systematically updated), which includes year of birth, place of residence, gender and occupation. This metadata indicates that in the 1980s, those most likely to volunteer were middle-class females, over 30 years of age, and living in the south-east. The MOA has since determined that current resources allow for the effective management of a panel of 500 writers, and although writers still volunteer, and are thus self-selecting, the MOA now has more formal selection criteria. Not all writers registered with the MOP respond to every directive, meaning that there are people who have contributed for a long time but have gaps in their contributions.

5.11 The demographic characteristics of MOP writers, or rather the issue of writers' representativeness, has been contested and discussed at length in Mass Observation Conferences and in published pieces (see for example, Pollen 2013; Thomas 2002; Shaw 1994). Some scholars argue that the archive does not need to be representative (Shaw 1994) and others argue that steps could be taken to make it more representative[17]. That this is a hotly-debated issue amongst users of the archive and the archivists (who try to represent the interests of the archive users as well as those of archive writers, and the archive itself) is testimony to the diversity of epistemological approaches that MOP scholars take, whose idea of 'good practice' regarding writer-selection differs across disciplines. This tension is reflected in the recent writer-recruitment strategy of the MOA archivists, with the MOP striving to recruit writers who more closely reflect the demographic make-up of the United Kingdom (UK). Currently, eligibility to join the full panel is contingent on being a young male living in the north east of England. Despite these efforts, the debate on representativeness of the 1981-2013 MOP panel continues.

5.12 Although not dismissing the issue of representativeness, our view is that it might be more helpful if the MOP research community (including potential new users of the MOP) were to shift its focus away from the question 'Are the MOP writers representative?' to the question 'Who are the MOP writers?'. At present the MOA can only provide very basic demographic information on its writers. The archive has not had the skills and resources with which to fully collate and anonymise the demographic information that it holds on MOP writers and make this available to its users. Neither has it had the skills and resources with which to analyse this demographic information properly. If this data were to be collated and analysed by skilled mixed-method researchers, would-be users of the archive could know who the writers are, and how they compare with the broader UK population. Potential users could then make informed, confident decisions on sampling, and the appropriateness of using MOP writers for their research; and the archive could make informed decisions on how it recruits its panel in the future. This argument forms the basis of the funding proposal discussed above. If funded, it is our hope that this work may settle, or at least better inform, the 'representativeness debate'.

Our selection criteria

5.13 These demographic data would have been very useful to us when we set about selecting which writers we should focus on for the project. Our selection criteria were driven by the limited available funding for Secondary Data Analysis Initiative Phase I projects[18] . This restricted us to sampling a relatively small number of individuals from the writers' panel whose writing we could follow through the 31 year time period. Our budget allowed us to select up to 40 individuals and to analyse their writing across the 15 directives that are relevant to the substantive aims of the project. The processes by which we chose these individuals illustrate a set of challenges that are specific to re-using data in a mixed methods project.

5.14 We considered a range of criteria for the selection of writers, taking into consideration different schools of thought regarding the importance of representativeness and completeness of data when sampling. Our first consideration was response rates. Using our carefully-crafted database of directives fielded against writers it was possible to identify all the individuals who had contributed to the full 15 relevant directives, followed by those who had responded to 14 out of 15, then 13 out of 15 etc. This approach yielded a cohort of writers, the majority of whom are female, almost all are now in retirement, and most of whom started writing for MOP in their mid-30's or later[19]. We questioned whether this would allow us to fully explore discourses around civic engagement at different stages in the life course and across genders. We discussed selecting a larger number of male writers, and writers who are not yet retired and thus entered into our own debate on representativeness of the MOP panel, and of the writers in our study.

5.15 During this selection process we were also aware that the available demographics of the MOP writers that we had collated mirror what we know about the demographics of volunteers, the so-called 'civic core', which in the UK consists of older, middle-class females from the south-east (Mohan and Bulloch 2012). As representatives of the 'civic core' the MOP writers are ideally situated to discuss and explore issues around civic-engagement, the substantive focus of our research. Therefore we agreed that we should not try to manipulate our selection so much that we lost this particular characteristic. Additionally, we were concerned that by trying to ensure some sort of additional representativeness beyond the group's civic engagement, we would run the risk of diluting the group, and would potentially miss unique unlooked-for voices amongst the panel. We decided that we would select our first 20 writers on the basis of their response rates to the relevant directives. After analysing their writing we would use this writing to guide us in our selection of the next 20 writers, by either identifying voices that were missing from the first cohort, or by identifying a particularly interesting sub-group that we would like to investigate.

5.16 Unfortunately, this careful plan was stymied by delays in the delivery of the first cohort of transcriptions (see below) and a deadline imposed by the planned closure of the archive for its move to different premises, which meant we had to select and access the writing of our second group of writers before the contributions of the first 20 writers could be analysed. This lack of time forced a decision, and we reluctantly resorted to the more traditional and pseudo-representative sampling/selection methods that we had rejected earlier; selecting a younger mixed-gender cohort with good response rates; but also selecting on occupation as a very loose indicator of class and educational background[20].

5.17 On reflection, perhaps Savage's (2007) 'random' sampling technique of simply selecting writers with 'surnames starting with A and B'[21] might have been truer to our initial aims. Yet this technique itself might be flawed, given that in the MOP writers with the prefixes A-B tend to sit at the top of the boxes in which writing is filed, and thus are easier and quicker to access than those in the middle or the bottom of the box. It would be interesting to know how many other MOP researchers have selected writing on the basis of ease of access; and, if this is the case, whether A-B is in fact, an over-used sample. Ease of access and over-use of particular groups of writers may also be an issue for the future; with the increasing digitisation of responses, researchers may well begin to choose writers on the basis of ease of electronic access to the material, an issue which is discussed further below.

Data preparation

Analytical approach

6.1 Although selection of our writers represented a challenge, considering how we would go about analysing their writing represented an additional challenge. This was to have an impact on how we retrieved and copied individual MOP scripts. After considerable thought and discussion, we chose to use the software program, MAXQDA, as a tool for analysis of the scripts selected for our study. There has been criticism of the use of this sort of tool to analyse MOP writing. Pollen (2013) for example, describes this as 'data-mining'; and there is concern across the different disciplines that in using software tools that provide lexical searches and counts, and that calculate aggregates, the researcher risks losing the rich individual voice of the MOP writer. However the work of the Computer Assisted Qualitative Data AnalysiS (CAQDAS) Networking project has convincingly made the case that computer assisted qualitative data analysis is not in itself a method or methodology, but rather it represents a set of tools that are flexible enough to adapt to a range of analytical approaches (Rivers and Bulloch 2011) that might be used by researchers in the humanities as well as the social sciences. Used responsibly, the tool is just as flexible as traditional analytical tools.

6.2 Our view is that MAXQDA enables the researcher to focus both on the individual writer and on a group of writers. It allows the researcher to look for continuity and change across an individual's life course (a diachronic approach) and appreciate the rich texture of that individual's life, whilst also looking for patterns and trends amongst a group of individuals at given points in time (a synchronic approach). Although it is a tool favoured by social scientists, it can be used effectively across disciplines to support the chosen method of analysis without losing the richness of the MOP material.

Preparing MOP scripts for analysis

6.3 Although MAQDA software is very versatile, it requires all text to be in a word-processed format before it can be imported into the program for analysis[22]. The format of scripts submitted to the MOP has varied over the last thirty years, but most are only available in hard copy format. Scripts from the 1980s are mostly handwritten; and those from the 1990s and early 2000s are a mix of hand-written, typed and word-processed formats. It is only recently, with the MOP's introduction of an email submission system, that respondents have begun to submit their word-processed scripts by email[23]. Given that the majority of scripts are in hard copy format, we sought to digitise them - first by scanning them and converting them into pdf format, and secondly by transcribing them so that they became machine-readable. We were aware that other researchers may have already transcribed some of the writing that we had chosen to use, and there was a sense again, of potentially replicating work already undertaken. With this, and the ESRC's aims for re-use of data/material in mind, we have, therefore, chosen to donate the pdfs and transcriptions back to the archive at the end of the project, for potential re-use by other researchers.

6.4 The decision to transcribe written texts took us out of familiar sociological territory. Traditionally, the medium of sociological qualitative enquiry tends to be the interview, which is translated into text by audio-transcribers. As sociologists we were unfamiliar with the tropes of the archive and how we should deal with archival material. Two issues emerged. Firstly we identified that there were ethical issues relating to quotation of the MOP writing. When quoting an MOP writer, should spelling and grammatical mistakes be replicated? And if so, should our transcriptions be faithful to spelling and grammatical errors? We considered this by trying to place ourselves in the shoes of the writers. MOP writing (particularly the handwritten pieces) often takes place in one sitting, with free flow of ideas and emotions, and is produced when the writer is in a safe, private space. As MOP writers, would we feel embarrassed if someone replicated our spelling mistakes? Would we feel that in some way our ideas were being misrepresented? Would we feel that the replication of errors took the focus away from the ideas we were trying to impart? On the other hand, would we feel patronised if someone corrected our spelling and grammatical mistakes? Would this make us feel that our private writing space had been invaded or sullied? We consulted the archivists to ask how others quoted MOP material, and were informed that researchers tend to reproduce writer's spelling and grammatical errors. We followed this precedent by seeking to transcribe the 'warts and all' versions of scripts.

6.5 Secondly, we considered what the scripts in their physical format might tell us about the writer. A face-to-face interview with a respondent will give the researcher visual and aural information, or signifiers, that are processed in order to form value-driven impressions about the respondent, relating to an interviewee's class, educational background, levels of wealth, mental health and personality. These impressions may or may not be recorded in field-notes but are likely to influence the way in which an interview is conducted, and may influence the way in which the interviews are analysed. In much the same way, the physical form that a script takes provide signifiers, which are encoded with meanings by the reader. By decoding the script (and acknowledging the value-driven nature of this decoding), by looking at the type of paper on which it is written, the stains it has acquired, the handwriting, the spelling, the colour of the ink, the reader forms an impression of the writer's level of education and class, favoured beverage, the level of care given to the theme on which they are writing and perhaps, whether they have written the piece in one sitting[24]. In this sense, the physical script itself is an artefact, and the word-processed transcription is a translation of that artefact. We needed, therefore, to decide what we valued about the scripts' physical format, and to consider what potential re-users of the transcriptions in the future might also value. We came to the decision that spelling, grammar, format (i.e. where the words were placed on the paper), editing, changes in pen colour, are all valuable pieces of information, which should be carried through into the transcription process. We agreed that transcripts should be analysed alongside the scanned pdfs of the script; and that we would create 'field-notes' discussing how the physical scripts, as well as the views expressed in the writing, had influenced and affected us; and that this information should form part of the project's meta-data and research archive.

The transcription process

6.6 When planning the project prior to submission of the funding bid, we had researched how long it would take to transcribe up to 600 MOP scripts of differing lengths, and what this would cost. We sought the views of researchers who had commissioned transcriptions of MOP scripts previously, and we also sought the views of historians, who might be familiar with archival transcriptions. We were pointed to a company of audio-transcribers which had previously undertaken MOP transcriptions. The company costed-up the work, gave an indication of the time it would take, and we were able to include this in the bid.

6.7 Having received approval from the ESRC to start the project, our first task was data preparation. We chose our first 20 writers, digitised their writing into scanned pdf form, and tried to commission the transcription work. The company that had originally costed the work failed to appreciate our need for a swift turnaround of the work. A second company of audio-transcribers were recommended to us. Despite being given a very careful brief, the company spell-checked and formatted the transcripts, thus removing much of the information that we had valued. A third company of transcribers, claiming that hand-written texts were their speciality were approached. This company promised accuracy and a swift turnaround, so we commissioned them to transcribe our first 20 writers' scripts. Our deadline for delivery of the transcripts, on which our choice of the next writers was contingent, came and went. When, eventually, the transcripts arrived, some were missing and the transcription of the handwritten texts contained substantial errors. We then interviewed and hired some temporary workers, attached to the university, to undertake quality checks of the returned transcripts, and finish the transcriptions of our 38 writers.

6.8 Our learning through this difficult process has been that close, face-to-face supervision of the transcription process is necessary; individual handwriting changes and becomes harder to read as a person ages and thus is more difficult to transcribe; accurate renditions of MOP scripts require a very methodical approach; and our costings and time-frame were inaccurate. On the positive side, we are now in a position to identify the cost, and length of time it takes to undertake transcriptions of MOP scripts for future research projects, and we hope that this may be of use to researchers from all disciplines in the future. A further learning point that we have taken on, is that even good transcribers have failed to format the scripts in the same way as the original. Having undertaken the initial coding and analysis of the uploaded transcriptions, our view is that even an accurate transcription of a writer's script is no substitution for a copy of the original script - we have had a copy of each script alongside us as we have coded the transcriptions. If we undertook this exercise again, we would not insist on faithfulness to formatting.

Early analytical questions

7.1 At the time of writing this paper, our analysis of the MOP writing is still in its early stages; we are currently grappling with issues relating to the longitudinal analysis of writing and the unique nature of MOP writing[25].

7.2 At the time of writing our longitudinal questions relate to life-course. How do we differentiate between a writer like W729, penning a response to a directive on paid work in 1983 at the age of 26, to W729 who writes about her working day in 2010 at the age of 53? If we analyse an individual's responses to 15 directives across the period 1981-2012, how do we take account of what has happened in that individual's life-course in the years between these directives? And how do we analyse potential changes, shifts and palimpsests in an individual's explanation, understanding of, or narrative of an event, when different questions are asked; or the same questions are asked at different periods of time? Analysis of MOP demands that the sociologist (and researchers from other disciplines) grapples with theories of individual, private, cultural, or collective recall, memory and identity, and theories of time when engaging with MOP writing[26] .

7.3 What influence does the directive itself have on an individual's writing? Writers' responses tend to mirror the directive in shape/structure/themes covered. For example, Sheridan (1993) suggests that if a directive is short in length, then writers will provide short answers. Despite this mirroring, responses to a directive often come with gaps in individual narratives, which can be seen by the researcher as a failure to respond to a specific question. This is the type of gap that would not be found in a structured or semi-structured interview (an interview with a set agenda) conducted by a rigorous interviewer. Conversely responses also come with rich autobiographical stream-of-consciousness riffs and tangents that can be off-topic, yet can contain research 'golden nuggets', which in an interview-scenario, a very rigorous interviewer might have prevented from taking place.

7.4 The way in which both writers and researchers engage with directives is an argument for the need for researchers to take into account that the directive itself is also a form of data, or research material, in the same way that an interview schedule is also data because it shapes and forms the responses give. Nevertheless, we also need to treat the directive and individual responses to it, as a different type of data, or material, to the interview with which we are more familiar. Although the data/materials that we are dealing with are qualitative in nature, some of the analytical challenges in handling these data are similar to those thrown up during the re-use of quantitative survey data.

7.5 What also stands out in this early stage of the research process is that there is no physical connection between the researcher and the writer, of the sort one would encounter in a face-to-face, or even telephone, interview. However, this is not to say that there is no relationship between the researcher and the writer. The researcher is a receptive audience of the writer, although the writer may not have the researcher in mind when writing, as Shaw (1994 and 1996) argues when she discusses the real and imagined relationships between the writer and the archivist, comparing these to the type of relationships that exist between psychoanalyst and patient.

7.6 There is evidence to suggest that the writer is also aware of the researcher. Many of the writers converse with 'you' (the reader) and ponder on what the reader might make of a particular anecdote offered up. In their responses to the 2004 directive 'Being part of research' many writers clearly state that they are aware of their potential audience. We have also noted that where directives are commissioned by researchers, some writers try to pre-guess what the commissioners are looking for; thus pushing at the boundaries of the researcher and researched relationship, and asserting the agency of the researched. It is also noticeable that some writers engage with the reader/researcher consciously and unconsciously. This can be through the way the writer engages and trusts the reader with private, personal information; through the expression of viewpoints, that might not be shared by the reader (for example, strong political views, or views on race and ethnicity). Or it may be through the narrative arcs that occur in some individuals' writing, where the writer is conscious of the audience and makes an effort to impress with knowledge, style of writing, or a particular view point; a form of writing that Sheridan (1993) refers to as a 'formal register'. This attempt at connecting with the reader can ebb and flow as the writer gets into his/her stride and loses and regains sight of the audience. There are, therefore, similarities in the relationship between researcher and the interviewee and the researcher and the writer; a relationship which can at times be contested, manipulated or controlled by the research subject, thereby challenging the agency of the researcher.

7.7 However, the nature of the archive forces the sociologist away from more traditional understandings of the subject of research, into the less familiar territory of the research subject as seen through the lens of other disciplines such as oral history, autobiography, cultural studies and critical theory. The MOP is not a resource that allows a researcher to stay within their individual discipline, be it sociology, history, literature. Rather it insists on crossing disciplinary divides at all stages of the research process. We look forward to a final stage of analysis and writing up that draws on a variety of different methodological and disciplinary tools.

Conclusion

8.1 Although it records the early days of a longitudinal research project, the value of this article is its relevance to prospective users of Mass Observation Archive materials from across different disciplines, in that it provides practical insights and solutions to the challenges involved in researching MOP writing. The practical examples presented in this paper include a discussion of the difficulties arising from inaccessible MOP metadata, the debate on writer representativeness, the sampling of individual MOP writers, the preparation of data, and the sociological use of MOP material for longitudinal study.

8.2 These discussions demonstrate that the nature of the archive frames the experience of the secondary analyst, requiring movement between analytical cultures, different disciplinary languages and methods, and an engagement with the interdisciplinary. Thus the sociologist's expectations and habits are challenged when engaging with this socio-historical data source. The access routes to the data, its structure, the relationship between researcher and researched, as well as the agency of the researcher, require a particular type of engagement with the data that challenges pre-conceptions and discipline-bound methodological approaches.

8.3 This paper also contributes to a wider debate on the challenges faced by a social science community that is increasingly being encouraged to work in interdisciplinary ways, using data sources for secondary analysis that are traditionally associated with other academic disciplines, and creating informed analyses that themselves may have reach and significance to researchers from a variety of different disciplinary backgrounds. The use of non-traditional data-sources offers an opportunity for sociologists to do more than cross disciplinary borders, and to work in a way that Griffin et al (2006) describe as striving to push at the borders of disciplines that they are engaging with.

The research community and the Mass Observation Archive

8.4 We would like to make a final point that relates to the UK Research Councils' more recent focus on interdisciplinarity and secondary analysis of data, which may result in increased research interest in data sources like the MOA. Unlike some other data sources, the MOA is not run by a statutory body but by a charity. Although increased research interest may have many benefits for the profile of the archive, it is important that there is some balance to this relationship, and that the MOA receives collaborative and/or in-kind help from the research community. The raising of the MOP's profile as a data source is, nevertheless, a very exciting development. It opens up a space for the research community to tackle some of the practical challenges associated with interdisciplinary working, with initiatives like this special edition of Sociological Research Online. Whilst it is likely that the papers presented here differ fundamentally in terms of the language, epistemological assumptions and analytical approaches they use, placing them next to each other gathers scholarship around the MOP, heightening awareness of disciplinary traditions and creating a forum within which interdisciplinary working may be possible.


Acknowledgements

Funding from the ESRC Secondary Data Analysis Initiative Phase 1 [ESRC Grant ES/K003550/1] is gratefully acknowledged.
The authors are grateful to Ros Edwards at the University of Southampton, and Jessica Scantlebury and Kirsty Pattrick at the Mass Observation Archive, for help with this article.


Notes

1See ESRC website page on Impacts and Findings http://www.esrc.ac.uk/funding-and-guidance/guidance/applicants/iii.aspx accessed June 2013.

2For example 'Mass Observing Today: Opportunities for new research' a day conference held in London in 2013, and the 'Mass Observation Anniversaries Conference' a three day event held at the University of Sussex in 2012.

3See ESRC website on Secondary Data Analysis Initiative Phase 2 funding opportunity http://www.esrc.ac.uk/funding-and-guidance/funding-opportunities/26605/secondary-data-analysis-initiative-phase-2.aspx accessed July 2013.

4See ESRC website on news and events http://www.esrc.ac.uk/news-and-events/announcements/25683/big-data-investment-capital-funding.aspx accessed September 2013.

5Email from Helen Robinson on behalf of Helen Barnard, Wednesday July 17 201. The email does not explain why this decision has been made, but it does clarify the data JRF intends to use "We are particularly interested in projects using the new Census and Understanding Society datasets, but will also be looking at projects using other datasets and involving mixed methods." This indicates a quantitative preference, and the reference to mixed-methods suggests a distancing away from purely qualitative work in the future.

6We note that the Phase 2 call differs from the Phase 1 call, in that it includes a paragraph encouraging the use of qualitative data.

'Qualitative data (non-numeric information) such as in-depth interview transcripts, diaries, anthropological field notes, answers to open-ended survey questions, or audio-visual recordings and images and mixed methods approaches combining qualitative data with numeric data can also be considered for further analysis.'
See ESRC website on Secondary Data Analysis Initiative Phase 2 funding opportunity http://www.esrc.ac.uk/funding-and-guidance/funding-opportunities/26605/secondary-data-analysis-initiative-phase-2.aspx accessed July 2013.

7See the ESDS Qualidata site (now part of the UK Data Service), for further suggested reading: http://www.esds.ac.uk/qualidata/support/reuse.asp accessed July 2013.

8 When this project was being set up the MOP was housed in the Special Collection's section of the University of Sussex library. The archive has since moved to The Keep, a new purpose-built building adjacent to the University of Sussex, and re-opened in late 2013. http://en.wikipedia.org/wiki/The_Keep,_Brighton.

9See ESRC website on Secondary Data Analysis Initiative Phase 1 funding opportunity http://www.esrc.ac.uk/funding-and-guidance/funding-opportunities/19214/secondary-data-analysis-initiative-phase-1-2012.aspx accessed July 2013.

10Conference celebrating the 75th anniversary of the MOA held at the University of Sussex, July 2012. See MOA website http://www.massobs.org.uk/conference.htm. The MOA refers to the collection of pre 1981 writings, as well as to the post 1981 collection, called the Mass Observation Project.

11The cluster proposal has been submitted and is in the process of being reviewed by the ESRC. However, even if the proposal is unsuccessful we aim to write about the process of bringing these different proposals together.

12See footnote 8.

13Although a recent project entitled 'Observing the 80's' has digitised a section of the material for that decade, and this can be accessed through the internet http://blogs.sussex.ac.uk/observingthe80s/.

14See MOA website http://www.massobs.org.uk/index.htm accessed July 2013.

15See University of Sussex Special Collections Website http://specialcollections.lib.sussex.ac.uk/CalmView/How.aspx accessed July 2013. The advanced search option offers the opportunity to search by catalogue, place or person. When accessing these sub-headings the investigator is informed that the sub-heading is a 'test header'.

16The MOA archivists report that more than three and a half thousand individuals have written for the MOP since 1981.

17For example there were several heated debates reflecting these views during questions and discussions at 'The Mass Observation Anniversaries Conference: Seventy five years of Mass Observation & thirty years of the Mass Observation Project' held at the University of Sussex, July 2012.

18Funding is capped at £200,000. Once the cost of salaries is paid for, there is little left for additional spending.

19The exception is W729 who was 26 when she first began writing for the MOP.

20For an analysis of class and the MOP, see Savage 2007 and 2010.

21All writers have been assigned numerical numbers with alphabetical prefixes (e.g. A883). Archivists have indicated that the 'rules' relating to allocation of these prefixes has changed over the years, and prefix does not necessarily relate to the writer's last name.

22Unless the text is treated as a picture, which greatly reduces the analytical functions available.

23Uptake of this method is increasing. When the authors commissioned a directive in 2012, approximately forty per cent of these were submitted electronically, enabling the authors to load these electronic scripts straight into the MAXQDA software.

24The changing nature of submissions to the MOP, to word-processed format, is starting to change the nature of the archive. Word-processing allows untracked editing to take place, so that writing has the capacity to be more polished, more thought-through, and less fluid. Although this makes for potentially better quality writing, the researcher may not get the first free-form thoughts of the writer, which is one of the positive qualities of the archive.

25For further information see http://longitudinalvolunteering.wordpress.com/2014/03/.

26See for example, Halbwachs 1992, Middleton and Woods 2000, Perks and Thomson 1998.


References

BRUCE, A, Lyall C, Tait J, and Williams R (2004), Interdisciplinary integration in the Fifth Framework Programme, Futures, Vol. 35, no. 4, p.457-470. [doi://dx.doi.org/10.1016/j.futures.2003.10.003]

CROW, G Edwards R, Nind M, Wiles R (2011) Opportunities for methodological synergies at the boundaries of the social sciences and the arts and humanities: Report prepared for the ESRC by the NCRM Southampton: NCRM.

GRIFFIN, G, Medhurst P, and Green T (2006) Interdisciplinarity in Interdisciplinary Research Programmes in the UK. University of Hull http://www.york.ac.uk/res/researchintegration/Interdisciplinarity_UK.pdf.

HALBWACHS, M (1992) On Collective Memory ed and trans. L A Coser. Chicago and London: University of Chicago Press.

HICKS, D M and Katz J S (1996) Where is science going? Science, Technology & Human Values, 21, p.379-406. [doi://dx.doi.org/10.1177/016224399602100401]

IRWIN, S and Winterton M (2012a) Qualitative secondary analysis and social explanation Sociological Research Online 17 (2) http://www.socresonline.org.uk/17/2/4.html. [doi://dx.doi.org/10.5153/sro.2626]

IRWIN, S and Winterton M (2012b ) 'Qualitative Secondary Analysis: A Guide to Practice', Timescapes Methods Guide Series, http://www.timescapes.leeds.ac.uk/.

IRWIN, S (2010) Working across qualitative and quantitative data. Childhood, youth and social inequalities Forum 21. European Journal on Child and Youth Research 6 (12): p.58-63

LANGHAMER, C 2008, Session 4: The commissioner, the re-user and the director for NCRM Seminar Series on Archiving and Reusing Qualitative Data, University of Sussex, November 10 2008. http://www.archive.cresc.ac.uk/events/archived/archiveseries/papers.html.

MASON, J (2007) 'Re-using' Qualitative Data: on the Merits of an Investigative Epistemology' Sociological Research Online, 12 (3) http://www.socresonline.org.uk/12/3/3.html. [doi://dx.doi.org/10.5153/sro.1507]

MIDDLETON, P and Woods T (2000) Literatures of Memory: History, time and space in post-war writing. Manchester: Manchester University Press.

MOHAN, J and Bulloch S (2012) 'The idea of a 'civic core': what are the overlaps between charitable giving, volunteering, and civic participation in England and Wales?' Third Sector Research Centre Working Paper 73. http://www.tsrc.ac.uk/Publications/tabid/500/Default.aspx.

MOORE, N (2007) (Re)Using Qualitative Data in Sociological Research Online, 12 (3) 3 http://www.socresonline.org.uk/12/3/1.html.

PERKS, R and Thomson A (1998) The Oral History Reader. London: Routledge. [doi://dx.doi.org/10.4324/9780203435960]

POLLEN, A (2013), Research Methodology in Mass Observation Past and Present: 'Scientifically, about as valuable as a chimpanzee's tea party at the zoo?' History Workshop Journal Advance Access published January 20, 2013 http://hwj.oxfordjournals.org/.

RIVERS, C and Bulloch S L (2011) CAQDAS - A contributor to social scientific knowledge? NCRM MethodsNEWS Spring 2011, Spring p.2-3. ISSN 1771.

SAVAGE, M (2007) Changing Social Class Identities in Post-War Britain: Perspectives from Mass-Observation Sociological Research Online 12 (3) 6 http://www.socresonline.org.uk/12/3/6.html. [doi://dx.doi.org/10.5153/sro.1459]

SAVAGE, M (2010) Identities and Social Change in Britain since 1940: The Politics of Method Oxford: Oxford University Press. [doi://dx.doi.org/10.1093/acprof:oso/9780199587650.001.0001]

SHAW, J (1994) Transference and Countertransference at the Mass Observation Archive: an Under-Exploited Research Resource Human Relations 47(11) p.1391-1408. [doi://dx.doi.org/10.1177/001872679404701105]

SHAW, J (1996) Surrealism, Mass-Observation and Researching Imagination in Lyon E S and Busfield J (eds) Methodological Imaginations. Explorations in Sociology 45 BSA. Basingstoke: MacMillan Press.

SHERIDAN, D (1993) Writing to the Archive: Mass-Observation as Autobiography Sociology 27 (1): p.27-40.

TAIT, J and Lyall C (2007) Short Guide to Developing Interdisciplinary Research Proposals, ISSTI Briefing Note (Number 1) March 2007: University of Edinburgh https://48822435ff65f9423681428795927e1499ecbbf7.googledrive.com/host/ 0B1arYAWdEZhkbnJjcEt2NlY4RTg/ISSTI_Briefing_note_1_developing_ID_proposals.pdf.

THOMAS, J (2002) Diana's Mourning: A people's history? Cardiff: University of Wales Press.