The work plan is divided into six tasks (A-F) which are briefly depicted in the following outline.
A. Specimen sampling (collection methodology)
Specimens of wild Atlantic cod (Gadus morhua) and common sole (Solea solea) are available for AquaGen through existing project collections (such as FishPopTrace) and have been genotyped for large numbers of SNP markers in each species. The exact geographic origin (date/region/longitude/latitude) and capture details of these samples has been documented.
Specimens of farmed sole have been collected from the North Sea and Mediterranean regions; additional farmed samples from the Irish Sea are being sought. Farmed cod are derived from the Faeroe Island with additional samples from Norway and Canada.
 Cod Sampling
Wild cod: All sampled cod populations have been genotyped for 1536 SNP's including SNP's associated with genes subject to selection. This encompasses information from 23 population samples with approximately 40 individuals genotyped per sample. This extensive dataset constitutes an invaluable data source for comparing wild and aquaculture individuals.
Aquaculture cod: A large number of cod samples from aquaculture together with associated data relating to broodstock, year and age of fish has been sampled. An existing collaboration with a cod breeding facility situated on the Faeroe Islands has supplied samples with known genetic relationships among individuals and family groups. These samples can be directly compared to wild individuals from the same region. The sampling of farmed cod will be extended by samples from the national cod breeding program in Norway and sample s from the Canadian genomic and cod broodstock development program.
 Sole Sampling
Wild sole: The sampling and analysis strategy is partitioned into an Atlantic and Mediterranean component. Analyses will be performed at the intra and inter-basin level.
Atlantic samples: Samples of adult soles have been collected and genotyped with 450+ SNP markers. Temporal replicates are available for various regions. Some samples have additionally been genotyped with 15 microsatellite markers for comparison and power analysis.
Mediterranean samples: Samples of adult soles have been collected and already genotyped with 450+ SNP markers.
Aquaculture samples: Given that many sole hatcheries are still at the production of F1 generations based on outbred wild parents (representing virtually the same genetic output than wild individuals), AquaGen focuses the analysis of genetic diversity, parentage analysis and a simulation study based on existing and novel data of farmed individuals. From the Atlantic, it is planned to include aquaculture individuals from the North Sea and from the English-Irish region. From the Mediterranean, North-Adriatic region, genotyped parents and their F1 offspring from a pilot farming of Solea solea will be used. Additionally parents and offspring from the Tyrrhenian sea will be analysed.
B. Genetic Markers
AquaGen will utilise panels of SNP markers developed and characterised for European populations of cod and sole as part of the EU FP7 FishPopTrace project. A total of 1536 SNP markers for cod and 450 SNP markers for sole have been identified, validated and screened across multiple populations throughout the species' European ranges (>1000 individuals per species). The resulting SNP panels have already demonstrated their utility for distinguishing closely related wild genetic populations (FishPopTrace unpublished data) based on a combination of neutral and non-neutral marker types.
C. Marker characterisation and assessment
In order to address the set objectives is necessary to establish genotype datasets for wild and farmed cod and sole. For the majority of project samples SNPs have already been genotyped. Wild population data has been generated as part of the FishPopTrace project (500,000 genotypes for sole, 1.5 million genotypes for cod). Farmed sole a North Sea breeding facility (two full sib families, 96 samples) and a parental panel from the North Adriatic, together with cod from the Faeroe Islands have been genotyped. Additional aquaculture samples will be genotyped and assessed for the Mediterranean and Irish Sea sole samples and all of the Norwegian and Canadian cod samples.
Marker characterisation and panel selection
The panels of SNPs used for tracing fish back to a farmed or wild origin should ideally be minimized and optimized for traceability applications to reduce the cost of analysis and to ensure that genetic marker systems will form a financially viable solution for the tracing fish and fish products. The characterisation, ranking and selection of SNPs that offer the greatest degree of discrimination to address specific assignment questions has been the subject of a recent review (Helyar et al., 2011). The most suitable approaches will be applied to the problems of wild vs farmed origin identification and the discrimination of individual aquaculture facilities.
Marker and method evaluation
Two analytical approaches for sample traceability will be evaluated as part of AquaGen, Population Assignment and Parent-Based Tagging. In addition, the marker type selected for this project, single nucleotide polymorphisms (SNPs), will be compared to microsatellite markers that have traditionally been used in population genetics. The following sections describe these core studies that form the basis of AquaGen.
1. Population assignment (PA)
Population assignment exploits differences in marker allele frequencies among populations to identify the most likely population of origin for a given individual, based on its marker profile. Prior to examining assignment power across marker panels, each marker will therefore be characterised in terms of it relative variation among populations, measured in terms of F(ST), to allow markers to be individually ranked according to their usefulness for the particular assignment question at hand. Assignment power is a function of the number of markers used and the level of genetic variation among populations observed at each marker. There are different algorithms employed for PA, the most popular of which are implemented in the software GeneClass2 (Piry et al., 2004). This programme will be used to examine the power of assignment of marker panels of increasing size, across a number of example questions. Population assignment is usually combined with population exclusion, a measure of the probability that a sample could theoretically have originated from each population. As population assignment will always describe one of the reference populations as being the most likely source, the use of a separate exclusion test is important in order to account for instances where the true population of origin is not in the reference data. Exclusion probabilities will also be computed using GeneClass2. An accurate assessment of assignment power is important in order to have confidence that any assay designed in silica will successfully assign actual test samples to their true origin. Recent research has indicated that assignment power estimates are frequently much higher than the true power provided by a marker set (Anderson, 2010a). In this project, the approaches suggested by Anderson for removing this observed bias will be implemented.
2. Parentage Based Tagging (PBT)
An alternative to Population Assignment is Parentage Based Tagging, in which individuals are identified to their farm of origin through identification of the parent individuals from which the aquaculture population was derived. PBT has been developed in to support Pacific salmon hatchery management in the USA, where it shows great promise as an alternative to physical tagging of hatchlings, allowing salmon released into the wild to be identified to their hatchery of origin. PBT relies on the ability to identify or exclude familial trios (parent-parent-offspring) based on the individual genetic data recorded in hatchery databases and the individual profile of fish subject to identification. PBT does require that most fish in the parental generation are samped (a few missing individuals can be accounted for) and ideally a fish should be known to be farmed, rather than wild, prior to conducting PBT analysis. For Pacific salmon this is achieved through physically fin-clipping every hatchery fish and genetic analysis of the fin tissue. While PBT requires multiple markers to enable accurate assignment, the assignment power increases exponentially with the number of loci (Anderson and Garza, 2006) and, unlike PA, the method can effectively discriminate between fish from two farms (or populations) with near identical allele frequencies. This is because even in such a scenario, it is extremely unlikely for two pairs of parents to have the same genotypes. A new software package for performing multiple parentage assignment tests known as SNPPIT (Anderson, 2010b) will be implemented to evaluate PBT for origin identification in cod and sole.
3. Simulated marker power comparison
Two principal marker types are currently employed for population genetic analysis in aquaculture and wild marine fish research: microsatellites and SNPs. While SNPs are increasingly considered as the marker of the future for wildlife traceability and forensic applications (Ogden, 2010), a large body of microsatellite data has been amassed for cod and common sole and many populations have already been genotyped with both marker types for reference building and inter-marker comparison. Therefore a simulation analysis to examine the relative power of the two marker types under both PA and PBT approaches, to help inform future decisions regarding genetic markers for fisheries traceability will e included in this study.
D. Tool development
Data, results and resources emerging during the project will be uploaded to the JRC fish genetic database, to ensure long-term availability and a high dissemination potential. It is envisaged that the project outputs will provide three types of information. At a specific level, panels of SNP markers that are capable of assigning individual cod and sole back to their origin should be produced. More generally, information relating to the most appropriate analytical methods for tracing fish should be available for each species. Lastly, the findings will be placed into the context of aquaculture management across all marine fish species targeted in Europe. All of these levels of output will be presented with respect to the long-term goal of developing applied tools to address traceability in a forensic context. The project team has extensive experience in transferring technology from baseline research through to applied forensic traceability tools. The lead partner (TRACE) is among the leading organizations in the world that focuses specifically on the development and application of forensic techniques to wildlife trade regulation and law enforcement. TRACE has been invited to advise the EU CITES enforcement group on multiple occasions and routinely trains national authorities to implement forensic testing, both in terms of field enforcement officers (customs officer I wildlife inspectors) and forensic scientists. In addition the groups at KU Leuven, DTU-AQUA and UNIBO have worked on forensic casework for their respective national authorities.
E. Integration into international efforts and a forensic framework
AquaGenis an example of new generation cutting-edge genomic research projects focusing on traceability of food and food products. Our ability to generate and apply genetic data to trace samples to their geographic or farmed origin is increasing rapidly. As such, Aquagen will sit alongside other projects that are developing DNA identification tools for use in trade regulation and management (e.g. the EU FP7 FishPopTrace project, the JRC's MerSNiP and SturSNiP projects and the UK Food Standards Agency's Meat Breed Verification project). We consider it important that AquaGen results are integrated into the emerging area of DNA-based traceability. With this in mind, consideration will be given to the format of project data, the potential to standardize protocols for downstream validation work and the development of a consensus regarding the statistical analysis and interpretation of results that may ultimately be used in a legal framework. Moreover it will be looked into the extension of the AquaGen approach to other farmed marine fish species such as Mediterranean aquaculture, European seabass (Dicentrarchus labrax) and gilthead seabream (Sparus aurata). The AquaGen consortium has extensive contacts and collaborations with organisations addressing issues relating to aquaculture management and the environmental impact assessment of escapees. To achieve dissemination and uptake of AquaGen outputs by both the enforcement and aquaculture management communities, a strategy will be devised in conjunction with the JRC for how best to promote the project. This will involve standard approaches such as publication of the project report and submission of papers to peer-reviewed journals, but it will also include outreach to the relevant international enforcement authorities, practicing forensic laboratories and the aquaculture industry, identified as the three principal stakeholder groups in this project.